Overview

Dataset Statistics

Number of Variables 19
Number of Rows 36275
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 10275
Duplicate Rows (%) 28.3%
Total Size in Memory 4.7 MB
Average Row Size in Memory 136.0 B
Variable Types
  • Categorical: 13
  • Numerical: 6

Dataset Insights

no_of_week_nights is skewed Skewed
lead_time is skewed Skewed
no_of_previous_bookings_not_canceled is skewed Skewed
avg_price_per_room is skewed Skewed
Dataset has 10275 (28.33%) duplicate rows Duplicates
no_of_adults has constant length 1 Constant Length
no_of_weekend_nights has constant length 1 Constant Length
type_of_meal_plan has constant length 1 Constant Length
required_car_parking_space has constant length 1 Constant Length
room_type_reserved has constant length 1 Constant Length
arrival_year has constant length 4 Constant Length
market_segment_type has constant length 1 Constant Length
repeated_guest has constant length 1 Constant Length
no_of_special_requests has constant length 1 Constant Length
booking_status has constant length 1 Constant Length
date_valid has constant length 1 Constant Length
no_of_week_nights has 2387 (6.58%) zeros Zeros
no_of_previous_bookings_not_canceled has 35463 (97.76%) zeros Zeros
  • 1
  • 2

Variables


no_of_adults

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (2) is over 3.39 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 2
2nd row 2
3rd row 1
4th row 2
5th row 2

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (2, 1) take over 50.0%
  • The largest value (2) is over 3.39 times larger than the second largest value (1)
  • no_of_adults has words of constant length

no_of_children

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394151
  • The largest value (0) is over 20.75 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0.00525
Median 1
Minimum 1
Maximum 2

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36276
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 20.75 times larger than the second largest value (1)

no_of_weekend_nights

categorical

Approximate Distinct Count 8
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (0) is over 1.69 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 2
3rd row 2
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 1.69 times larger than the second largest value (1)
  • no_of_weekend_nights has words of constant length

no_of_week_nights

numerical

Approximate Distinct Count 18
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 580400
Mean 2.2043
Minimum 0
Maximum 17
Zeros 2387
Zeros (%) 6.6%
Negatives 0
Negatives (%) 0.0%
  • no_of_week_nights is skewed right (γ1 = 1.5993)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 1
Median 2
Q3 3
95-th Percentile 5
Maximum 17
Range 17
IQR 2

Descriptive Statistics

Mean 2.2043
Standard Deviation 1.4109
Variance 1.9907
Sum 79961
Skewness 1.5993
Kurtosis 7.797
Coefficient of Variation 0.6401
  • no_of_week_nights is not normally distributed (p-value 1.5682818121393876e-14)
  • no_of_week_nights has 324 outliers

type_of_meal_plan

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (0) is over 5.43 times larger than the second largest value (3)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 3
3rd row 0
4th row 0
5th row 3

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (0, 3) take over 50.0%
  • The largest value (0) is over 5.43 times larger than the second largest value (3)
  • type_of_meal_plan has words of constant length

required_car_parking_space

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (0) is over 31.27 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 31.27 times larger than the second largest value (1)
  • required_car_parking_space has words of constant length

room_type_reserved

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (0) is over 4.64 times larger than the second largest value (3)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (0, 3) take over 50.0%
  • The largest value (0) is over 4.64 times larger than the second largest value (3)
  • room_type_reserved has words of constant length

lead_time

numerical

Approximate Distinct Count 352
Approximate Unique (%) 1.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 580400
Mean 85.2326
Minimum 0
Maximum 443
Zeros 1297
Zeros (%) 3.6%
Negatives 0
Negatives (%) 0.0%
  • lead_time is skewed right (γ1 = 1.2924)

Quantile Statistics

Minimum 0
5-th Percentile 1
Q1 17
Median 57
Q3 126
95-th Percentile 273
Maximum 443
Range 443
IQR 109

Descriptive Statistics

Mean 85.2326
Standard Deviation 85.9308
Variance 7384.1053
Sum 3.0918e+06
Skewness 1.2924
Kurtosis 1.1793
Coefficient of Variation 1.0082
  • lead_time is not normally distributed (p-value 4.078256925041208e-14)
  • lead_time has 1331 outliers

arrival_year

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2502975
  • The largest value (2018) is over 4.57 times larger than the second largest value (2017)

Length

Mean 4
Standard Deviation 0
Median 4
Minimum 4
Maximum 4

Sample

1st row 2017
2nd row 2018
3rd row 2018
4th row 2018
5th row 2018

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 145100
  • The top 2 categories (2018, 2017) take over 50.0%
  • The largest value (2018) is over 4.57 times larger than the second largest value (2017)
  • arrival_year has words of constant length

arrival_month

numerical

Approximate Distinct Count 12
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 580400
Mean 7.4237
Minimum 1
Maximum 12
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • arrival_month is skewed left (γ1 = -0.3482)

Quantile Statistics

Minimum 1
5-th Percentile 2
Q1 5
Median 8
Q3 10
95-th Percentile 12
Maximum 12
Range 11
IQR 5

Descriptive Statistics

Mean 7.4237
Standard Deviation 3.0699
Variance 9.4243
Sum 269293
Skewness -0.3482
Kurtosis -0.9332
Coefficient of Variation 0.4135
  • arrival_month is not normally distributed (p-value 9.35224847491664e-06)

arrival_date

numerical

Approximate Distinct Count 31
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 580400
Mean 15.597
Minimum 1
Maximum 31
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • arrival_date is skewed right (γ1 = 0.0288)

Quantile Statistics

Minimum 1
5-th Percentile 2
Q1 8
Median 16
Q3 23
95-th Percentile 29
Maximum 31
Range 30
IQR 15

Descriptive Statistics

Mean 15.597
Standard Deviation 8.7404
Variance 76.3954
Sum 565781
Skewness 0.02881
Kurtosis -1.1572
Coefficient of Variation 0.5604
  • arrival_date is not normally distributed (p-value 5.521112350878314e-54)

market_segment_type

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (4) is over 2.2 times larger than the second largest value (3)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 3
2nd row 4
3rd row 4
4th row 4
5th row 4

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (4, 3) take over 50.0%
  • The largest value (4) is over 2.2 times larger than the second largest value (3)
  • market_segment_type has words of constant length

repeated_guest

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (0) is over 38.01 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 38.01 times larger than the second largest value (1)
  • repeated_guest has words of constant length

no_of_previous_cancellations

categorical

Approximate Distinct Count 9
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394179
  • The largest value (0) is over 181.5 times larger than the second largest value (1)

Length

Mean 1.0008
Standard Deviation 0.02826
Median 1
Minimum 1
Maximum 2

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36304
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 181.5 times larger than the second largest value (1)

no_of_previous_bookings_not_canceled

numerical

Approximate Distinct Count 59
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 580400
Mean 0.1534
Minimum 0
Maximum 58
Zeros 35463
Zeros (%) 97.8%
Negatives 0
Negatives (%) 0.0%
  • no_of_previous_bookings_not_canceled is skewed right (γ1 = 19.2494)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 58
Range 58
IQR 0

Descriptive Statistics

Mean 0.1534
Standard Deviation 1.7542
Variance 3.0771
Sum 5565
Skewness 19.2494
Kurtosis 457.3169
Coefficient of Variation 11.4344
  • no_of_previous_bookings_not_canceled is not normally distributed (p-value 4.231232402160041e-25)
  • no_of_previous_bookings_not_canceled has 812 outliers

avg_price_per_room

numerical

Approximate Distinct Count 3930
Approximate Unique (%) 10.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 580400
Mean 103.4235
Minimum 0
Maximum 540
Zeros 545
Zeros (%) 1.5%
Negatives 0
Negatives (%) 0.0%
  • avg_price_per_room is skewed right (γ1 = 0.6671)

Quantile Statistics

Minimum 0
5-th Percentile 61
Q1 80.3
Median 99.45
Q3 120
95-th Percentile 165
Maximum 540
Range 540
IQR 39.7

Descriptive Statistics

Mean 103.4235
Standard Deviation 35.0894
Variance 1231.2677
Sum 3.7517e+06
Skewness 0.6671
Kurtosis 3.1535
Coefficient of Variation 0.3393
  • avg_price_per_room is not normally distributed (p-value 4.933410647581557e-08)
  • avg_price_per_room has 1696 outliers

no_of_special_requests

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (0) is over 1.74 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 1
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 1.74 times larger than the second largest value (1)
  • no_of_special_requests has words of constant length

booking_status

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (1) is over 2.05 times larger than the second largest value (0)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (1, 0) take over 50.0%
  • The largest value (1) is over 2.05 times larger than the second largest value (0)
  • booking_status has words of constant length

date_valid

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 2394150
  • The largest value (1) is over 979.41 times larger than the second largest value (0)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 1
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 36275
  • The top 2 categories (1, 0) take over 50.0%
  • The largest value (1) is over 979.41 times larger than the second largest value (0)
  • date_valid has words of constant length

Interactions

Correlations

Missing Values