Foundations for Data Science · Score 60 / 60

FoodHub Data Analysis
Every chart and result, explained in plain English

A New York food delivery app wants to understand its own business. We analysed 1,898 real orders to answer 17 specific business questions — from "who are our top restaurants?" to "how much money do we actually make per order?" Below is every chart, every number, and what it actually means.

1,898 orders
9 data fields per order
17 business questions answered
26 charts generated
Score: 60 / 60
Jump to a section
1 · The Dataset 2 · Data Cleaning 3 · Missing Ratings 4 · Distributions (Q6) 5 · Top Restaurants (Q7) 6 · Weekend Cuisine (Q8) 7 · Order Cost > $20 (Q9) 8 · Delivery Time (Q10) 9 · Top Customers (Q11) 10 · Multivariate Analysis (Q12) 11 · Promo Eligibility (Q13) 12 · Revenue (Q14) 13 · 60-Minute Orders (Q15) 14 · Weekday vs Weekend (Q16) 15 · Conclusions & Recommendations
Chapter 1

Understanding the Dataset

Before any analysis, we need to know what we're working with. How big is the data? What type of information does each column hold? Are there any problems with it?

Question 1

How many rows and columns are present in the data?

1,898
Rows (= orders)
9
Columns (= data fields)

Think of the dataset as a spreadsheet with 1,898 rows — one for each order ever placed — and 9 columns of information about that order: who ordered, from which restaurant, what cuisine, how much it cost, how long prep and delivery took, whether the customer left a rating, and what day of the week it was.

Question 2

What are the data types of the different columns?

order_id, customer_id → Integer (whole numbers / ID codes) restaurant_name, cuisine_type, day_of_the_week → Text (categories) cost_of_the_order → Decimal number (price in $) rating → Decimal number (1–5, after cleaning) food_preparation_time, delivery_time → Integer (minutes)

Each column stores a different type of data. Numbers that represent IDs (order number, customer ID) are stored as integers. Prices use decimals. Text columns store categories like "American" or "Weekend". Knowing the type matters because you can't, for example, take the average of a text column — you'd first need to convert it.

Chapter 2

Data Cleaning

Real data is rarely perfect. We need to fix problems before we can trust the analysis.

Question 3

Are there any missing values? How do we handle them?

736
Missing ratings (38.8% of orders)
0
Missing values in all other columns

The rating column contained the text "Not given" wherever a customer didn't leave a review. That's not a number — so before we could do any maths with it, we converted every "Not given" into a proper blank value (called NaN, meaning "not a number").

We chose not to fill in those blanks with a made-up number, because a missing rating isn't the same as a bad rating — it just means the customer didn't bother. Keeping them blank preserves that distinction. Every other column was complete — no other gaps to fix.

Question 4

What is the statistical summary? How long does food preparation take?

20 min
Minimum prep time
27.4 min
Average prep time
35 min
Maximum prep time
$16.50
Average order cost
24.2 min
Average delivery time

A statistical summary gives you the "shape" of each column at a glance — the lowest value, the highest, the average, and how spread out the data is. Prep time is remarkably consistent: it always falls between 20 and 35 minutes, with an average of about 27. That's a narrow window — restaurants seem to have reliable kitchen speeds.

Adding prep (27 min) and delivery (24 min) gives a total expected wait of ~51 minutes from clicking "order" to food at the door.

Chapter 3

Data Quality: Outliers & Missing Ratings

Before diving into the main analysis, we do extra checks — catching extreme values and quantifying how many ratings are actually missing.

Outlier Detection · IQR Method

Are there any unusually extreme values hiding in the data?

The IQR (Interquartile Range) method flags outliers by looking at the middle 50% of all values and marking anything dramatically above or below as suspicious. Picture the full range of delivery times — most cluster between 17 and 31 minutes. An order taking 60 minutes would be flagged as an outlier worth investigating.

In this dataset, outliers in delivery time and cost reflect real operational edge cases (slow restaurants, large orders) rather than data entry errors — so we kept them in rather than deleting them. They're genuine data points the business needs to understand.

Question 5

How many orders are not rated?

736
Unrated orders (38.8%)
1,162
Rated orders (61.2%)

Nearly 4 in 10 orders were delivered without the customer leaving any rating. That's a significant blind spot for the business — if you can't measure satisfaction, you can't manage it. Of the orders that were rated, the average score was 4.34 out of 5, with no ratings below 3 stars at all. So the customers who bother to rate are generally happy — but there's a large silent majority whose experience we simply don't know.

Chapter 4

Exploratory Analysis: Distributions (Q6)

Before answering specific questions, we plot each variable individually to understand its shape, range, and quirks. These charts are the foundation of everything that follows.

Q6 · Distribution

Order Cost Distribution

Order Cost Distribution
Histogram + boxplot of order cost in USD
Mean: $16.50 Median: $14.14 IQR: $12.08 – $22.30 Max: $35.41

The cost distribution is right-skewed — most orders cluster in the $10–$18 range, but a long tail stretches toward $35. The mean ($16.50) is pulled above the median ($14.14) by those expensive outlier orders.

In plain terms: most people order a mid-priced meal, but a notable minority splurge on expensive orders — and those high-ticket orders matter more for FoodHub's revenue (higher commission rate).

Q6 · Distribution

Food Preparation Time Distribution

Food Preparation Time
Distribution of kitchen preparation time (minutes)
Min: 20 min Mean: 27.37 min Median: 27.00 min Max: 35 min Std Dev: 4.67 min

Prep time is remarkably consistent — always between 20 and 35 minutes, nearly symmetric around a mean of 27 minutes. The mean and median are almost identical, which tells us there are no extreme outliers pulling the average up or down.

This suggests restaurants on the platform have reliable, predictable kitchens — good news for setting customer expectations accurately.

Q6 · Distribution

Delivery Time Distribution

Delivery Time Distribution
Distribution of delivery time (minutes)
Min: 15 min Mean: 24.16 min Median: 25.00 min Max: 33 min

Delivery time ranges from 15 to 33 minutes and is roughly symmetric — no extreme outliers on either end. Most deliveries land in the 20–28 minute window.

Combined with prep time (~27 min), the average customer waits around 51 minutes total. That's competitive for New York City.

Q6 · Distribution

Rating Distribution

Rating Distribution
Distribution of customer ratings (3, 4, or 5 stars only)
Mean rating: 4.34 / 5 Median rating: 5.00 / 5 Ratings below 3: NONE 5-star ratings: most frequent 4 or 5 stars: 83.8% of rated orders

No customer who left a rating gave fewer than 3 stars. The vast majority (84%) gave 4 or 5 stars. The rating of 5 is the single most common score. This sounds great — but remember, 39% of customers left no rating at all, so we're only seeing the feedback of those who bothered to respond.

The silent 39% could be happy, unhappy, or indifferent — we simply don't know.

Q6 · Distribution

Day of Week: Weekend vs Weekday

Day of Week Distribution
Order volume: weekday vs weekend
Weekend orders: 1,351 (71.2%) Weekday orders: 547 (28.8%)

71% of all orders are placed on weekends. People are far more likely to order in on a Saturday or Sunday than during the work week — which makes sense: more leisure time, less cooking motivation.

This has big operational implications: FoodHub needs to staff its delivery network primarily for weekends, and any capacity issues will most acutely show up then.

Q6 · Distribution

Cuisine Type Distribution

Cuisine Type Distribution
Order count by cuisine type (top 10)
American: 584 orders Japanese: 470 orders Italian: 298 orders Chinese: 215 orders Mexican: 77 orders Indian: 73 orders Middle Eastern: 49 orders Mediterranean: 46 orders Thai: 19 orders French: 18 orders

American cuisine dominates, accounting for nearly a third of all orders. Japanese is a strong second. The top three cuisines (American, Japanese, Italian) together make up over 70% of all orders.

This concentration tells FoodHub where to focus restaurant recruitment efforts and promotional spend.

Q6 · Distribution

Restaurant Volume Overview

Restaurant Overview
Overview of restaurant order volumes across the platform
Total unique restaurants: 178 Platform is highly concentrated: a few restaurants get most orders

The platform hosts 178 different restaurants, but order volume is highly unequal — a small number of restaurants account for a disproportionate share of orders. This is a classic "long tail" distribution: a few stars, and then hundreds of smaller players.

Chapter 5

Top Restaurants by Orders (Q7)

Question 7

Which are the top 5 restaurants in terms of orders received?

Top 5 Restaurants
Top 5 restaurants by number of orders
1. Shake Shack 219 orders 2. The Meatball Shop 132 orders 3. Blue Ribbon Sushi 119 orders 4. Blue Ribbon Fried Chicken 96 orders 5. Parm 68 orders

Shake Shack is the clear winner — with 219 orders, it gets almost 70% more than its nearest competitor. That's brand power at work: people know it, trust it, and order from it repeatedly.

These five restaurants represent a critical revenue source for FoodHub. If any of them left the platform or had quality issues, it would meaningfully impact the business. They deserve priority attention and support.

Chapter 6

Most Popular Cuisine on Weekends (Q8)

Question 8

Which cuisine is most popular on weekends?

Weekend Cuisine Popularity
Order count by cuisine type on weekends
Weekend orders breakdown: American: 415 (30.7% of weekend orders) Japanese: 335 Italian: 207 Chinese: 163 Mexican: 53 Indian: 49

American food wins the weekend — 415 weekend orders, nearly a third of all weekend activity. Japanese is a strong second. The ranking mirrors the overall platform distribution, but weekend volumes are much higher across the board.

For FoodHub's marketing team: weekend promotions tied to American and Japanese restaurants would target the highest-demand segment at exactly the right time.

Chapter 7

Orders Costing More Than $20 (Q9)

Question 9

What percentage of orders cost more than $20?

Order Cost Over $20
Distribution of orders by cost tier
Orders ≤ $5: 9 (0.47%) Orders $5–$20: 1,334 (70.28%) Orders > $20: 555 (29.24%) Total orders: 1,898

About 1 in 3 orders costs more than $20. This matters financially because FoodHub charges a higher commission (25%) on these orders versus 15% on the $5–$20 tier. So while they're only 29% of orders by count, they generate a disproportionate share of revenue.

Barely any orders (0.47%) fall under $5 — the lowest commission tier. That tier is essentially irrelevant to FoodHub's business.

Chapter 8

Mean Delivery Time (Q10)

Question 10

What is the mean order delivery time?

24.16 min
Mean delivery time
27.37 min
Mean prep time
~51 min
Total from order to door

The average delivery rider takes just over 24 minutes to pick up and drop off an order. Add in the ~27 minutes the kitchen needs, and the typical customer is waiting about 51 minutes in total. For a city as busy as New York, that's a reasonable benchmark — though as we'll see later, a meaningful slice of orders exceeds 60 minutes.

Chapter 9

Top 3 Most Frequent Customers (Q11)

Question 11

Who are the top 3 most frequent customers, and how many orders did they place?

Top 3 Customers
Top 3 customers by order frequency
1. Customer #52832 → 13 orders 2. Customer #47440 → 10 orders 3. Customer #83287 → 9 orders Combined: 32 orders (1.69% of total)

The most prolific customer placed 13 orders — compare that to the average customer, who orders only once or twice. These three "power users" represent a tiny fraction of the customer base (1.69% of orders) but are clear loyalty candidates.

FoodHub decided to reward them with 20% discount vouchers — a smart move to keep their best customers engaged and ordering regularly.

Chapter 10

Multivariate Analysis (Q12)

So far we've looked at each variable on its own. Now we compare them against each other — looking for relationships, patterns, and surprises.

Q12 · Pairplot

How do all numerical variables relate to each other?

Pairplot
Pairplot: every numeric variable plotted against every other one simultaneously

A pairplot is a grid of mini-charts. Each row and column represents a variable (cost, prep time, delivery time, rating). Where two different variables cross, you get a scatter plot showing how they relate. Where a variable meets itself (the diagonal), you get its distribution.

The key takeaway here: almost no strong relationships exist between these variables. The dots form clouds rather than lines, which means knowing someone's delivery time tells you almost nothing about their rating or order cost — they're largely independent.

Q12 · Correlation Matrix

Do any variables move together?

Correlation Heatmap
Correlation heatmap (darker = stronger relationship)
cost ↔ prep_time: +0.042 (near zero) cost ↔ delivery: -0.030 (near zero) prep ↔ delivery: +0.011 (near zero) prep ↔ total_time: +0.686 (moderate)

Correlation scores range from -1 (perfect opposite) to +1 (perfect match). Scores near 0 mean no relationship. The only notable relationship here: prep time and total time correlate at 0.69 — which makes sense, since total time is literally prep + delivery. Everything else is essentially uncorrelated.

This is important: a longer prep time does not lead to worse ratings, and a more expensive order does not take longer to deliver. Each variable stands on its own.

Q12 · Weekday vs Weekend

Do people spend more on weekends?

Cost Weekday vs Weekend
Average order cost: weekday vs weekend
Weekday average cost: $16.31 Weekend average cost: $16.57 Difference: $0.26

Barely any difference — people spend almost exactly the same on weekdays and weekends. The $0.26 gap is statistically negligible. Day of the week doesn't influence how much someone orders.

Q12 · Weekday vs Weekend

Is delivery actually slower on weekends?

Delivery Time Weekday vs Weekend
Mean delivery time split by day type
Weekday mean delivery: 28.34 min Weekend mean delivery: 22.47 min Difference: 5.87 min Statistical test: significant (p < 0.05)

Counterintuitively, weekends are actually faster for delivery (22.5 min vs 28.3 min on weekdays). This was confirmed statistically — the difference is real, not random noise.

A likely explanation: on weekdays, delivery riders face more traffic (office rush hours, lunch surges). On weekends, while order volume is higher, the geographic patterns may be more spread out and traffic lighter. This is the kind of non-obvious insight that data science reveals.

Q12 · By Cuisine

Which cuisines tend to be most expensive?

Average Cost by Cuisine
Average order cost by cuisine type
French: $19.79 Southern: $19.30 Thai: $19.21 Spanish: $18.99 Middle Eastern: $18.82 Mexican: $16.93 Indian: $16.92 Italian: $16.42 American: $16.32 Chinese: $16.31

French and Southern cuisines command the highest average order values — nearly $20 per order, triggering FoodHub's higher commission tier. The cuisines most popular by volume (American, Japanese) actually sit in the mid-range for cost.

This suggests a trade-off: French cuisine generates more revenue per order but fewer orders overall. American cuisine generates high volume at a slightly lower margin per order.

Q12 · Rating Drivers

What drives customer ratings?

Rating Drivers
Exploring what correlates with customer ratings
Average Cost by Rating
Average order cost by star rating
Average order cost by rating: 3 stars → $16.22 4 stars → $16.71 5 stars → $16.97 Correlation: prep/delivery time vs rating ≈ 0 (no relationship)

Neither delivery speed nor order cost strongly predicts ratings. Higher-rated orders do cost slightly more, but the difference ($16.22 for 3-star vs $16.97 for 5-star) is tiny. What drives a 5-star vs 3-star rating remains unclear from this data alone — it likely depends on food quality and the human element of service, which aren't captured in these columns.

Chapter 11

Restaurants Eligible for Promotions (Q13)

Question 13

Which restaurants qualify for a promotional offer? (50+ ratings AND average rating > 4)

Promotional Eligibility
Restaurants meeting both criteria simultaneously
Criteria: ✓ Rating count > 50 ✓ Average rating > 4.0 Eligible restaurants: 4 1. The Meatball Shop 2. Blue Ribbon Sushi 3. (+ 2 others) Note: Shake Shack has the most orders but may not meet the avg rating threshold

Only 4 restaurants meet both conditions simultaneously — high volume (50+ ratings received) AND high quality (average above 4 stars). The criteria are deliberately strict to ensure promotional backing goes to partners who are both popular and consistently excellent.

Interestingly, Shake Shack — the platform's top restaurant by volume — may not make the cut on average rating. This hints that volume and quality don't always go together.

Chapter 12

Net Revenue Generated (Q14)

Question 14

How much revenue does FoodHub generate across all orders?

Revenue Breakdown
Revenue by order cost tier
Revenue Chart 2
Revenue contribution by tier
Total Revenue Generated: $6,166.30 Total Order Value: $31,314.82 Average Commission: 19.69% Commission Structure: Orders > $20 → 25% commission Orders $5–$20 → 15% commission Orders ≤ $5 → $0 revenue

FoodHub earned $6,166.30 in commission across all 1,898 orders — an average effective rate of 19.69%. Even though orders above $20 represent only 29% of all orders, they generate a disproportionate share of revenue due to the higher commission rate.

The tiered structure creates a clear financial incentive to attract and retain higher-value orders. Every dollar increase in average order value above $20 earns FoodHub 25 cents — a compelling reason to promote premium restaurants and larger basket sizes.

Chapter 13

Orders Taking More Than 60 Minutes (Q15)

Question 15

What percentage of orders take more than 60 minutes from order to delivery?

60-Minute Orders
Total time distribution highlighting the 60-minute threshold
Orders > 60 minutes: 200 (10.54%) Total orders: 1,898 Mean total time: 51.53 min Median total time: 52.00 min

1 in 10 orders takes more than an hour. That's the threshold where customer frustration typically kicks in — and where negative reviews become far more likely. With 200 such orders in the dataset, this is a real operational challenge.

The average total time of 51.5 minutes sits tantalizingly close to that 60-minute line. Any restaurant or route that runs even slightly slower than average risks pushing orders over the edge. Identifying and fixing those bottlenecks is a high-priority recommendation.

Chapter 14

Weekday vs Weekend Delivery Time (Q16)

Question 16

How does mean delivery time compare on weekdays vs weekends?

Weekday vs Weekend Delivery
Delivery time comparison with statistical test
Mean Median Std Min Max Weekday: 28.34 28.0 2.89 24 33 Weekend: 22.47 22.0 4.63 15 30 Statistical significance: YES (p < 0.05) The difference is real, not random noise.

Weekday delivery averages 28.3 minutes — nearly 6 minutes longer than the weekend average of 22.5 minutes. A Welch t-test confirms this difference is statistically significant: it's not a fluke of the sample.

Weekend delivery also has more variability (std 4.63 vs 2.89) — so while the average is faster on weekends, the experience is less predictable. Some weekend orders are very fast; others take longer.

Operationally: weekday delivery is slower and more consistent; weekend delivery is faster but more variable.

Chapter 15

Conclusions & Recommendations (Q17)

Every finding translates into a concrete action the business can take.

Question 17 · Conclusions

Key conclusions from the full analysis

1. Ratings coverage is limited. 39% of orders go unrated, which means nearly half the customer experience is invisible. The ratings we do have skew positive (mean 4.34), but silence isn't the same as satisfaction.

2. A long-tail delivery risk exists. While average wait times are reasonable, 10.5% of orders breach 60 minutes — a known trigger for dissatisfaction. The underlying causes (slow restaurants, difficult routes) need to be identified.

3. Platform concentration is high. A handful of restaurants (Shake Shack, The Meatball Shop, Blue Ribbon Sushi) drive a large share of orders. This is both a strength (reliable revenue) and a risk (dependency).

4. Weekends are the core business. 71% of orders land on weekends. The platform's operational model must be designed around weekend demand, not weekday demand.

5. High-value orders are the profit engine. Orders above $20 (29% of volume) drive disproportionate revenue through the higher commission tier. Growing this segment — through restaurant mix or basket-size incentives — is the clearest financial lever.

Question 17 · Recommendations

Action-oriented recommendations for the business

A) Fix the ratings gap. Add a frictionless in-app rating prompt delivered minutes after the order arrives. A 1-click rating (then optional comment) removes barriers. Even a 10-point increase in rating participation would significantly improve operational visibility.

B) Attack the 60-minute problem. Identify which restaurants, time slots, and delivery routes are responsible for orders breaking the 60-minute threshold. Work directly with those restaurants on kitchen speed, or adjust delivery zone boundaries.

C) Protect your top restaurant relationships. Shake Shack, The Meatball Shop, and Blue Ribbon Sushi are too important to lose. Offer them preferential support, faster dispute resolution, and joint marketing to keep them deeply engaged with the platform.

D) Build a proper loyalty programme. The top 3 customers place 9–13 orders each. A structured loyalty scheme — points, tiers, exclusive perks — would expand this "power user" segment beyond three individuals.

E) Plan capacity for weekends. Staffing, delivery assignments, and support resources should be biased heavily toward Saturday and Sunday. Weekday operations are a secondary priority.

F) Grow high-value orders. Promote family meals, bundle deals, and premium restaurants that naturally produce orders above $20. Each additional dollar above the $20 threshold earns FoodHub 25 cents — the clearest unit-economics lever available.