+-----------------------------+---------+
| Column Name | Type |
+-----------------------------+---------+
| delivery_id | int |
| customer_id | int |
| order_date | date |
| customer_pref_delivery_date | date |
+-----------------------------+---------+
delivery_id is the primary key of this table.
The table holds information about food delivery to customers that make orders at some date and specify a preferred delivery date (on the same order date or after it).
If the customer’s preferred delivery date is the same as the order date, then the order is called immediate; otherwise, it is called scheduled.
The first order of a customer is the order with the earliest order date that the customer made. It is guaranteed that a customer has precisely one first order.
Write an SQL query to find the percentage of immediate orders in the first orders of all customers, rounded to 2 decimal places.
The query result format is in the following example.
Input:
Delivery table:
+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1 | 1 | 2019-08-01 | 2019-08-02 |
| 2 | 2 | 2019-08-02 | 2019-08-02 |
| 3 | 1 | 2019-08-11 | 2019-08-12 |
| 4 | 3 | 2019-08-24 | 2019-08-24 |
| 5 | 3 | 2019-08-21 | 2019-08-22 |
| 6 | 2 | 2019-08-11 | 2019-08-13 |
| 7 | 4 | 2019-08-09 | 2019-08-09 |
+-------------+-------------+------------+-----------------------------+
Output:
+----------------------+
| immediate_percentage |
+----------------------+
| 50.00 |
+----------------------+
Explanation:
The customer id 1 has a first order with delivery id 1 and it is scheduled.
The customer id 2 has a first order with delivery id 2 and it is immediate.
The customer id 3 has a first order with delivery id 5 and it is scheduled.
The customer id 4 has a first order with delivery id 7 and it is immediate.
Hence, half the customers have immediate first orders.
The following is the initial solution I came up with, using CTE and
window function. The choice of CTE is obvious, and the reason for the
window function is that I want to aggregate by the
order_date
column to find the first orders, but I still
need to the customer_pref_delivery_date
column which is not
aggregated.
The immediate_percentage
being asked is essentially just
the average of the logical statement
customer_pref_delivery_date = order_date
(aliased to
immed
). In SQL, true
is 1 and
false
is 0.
Originally I had many other columns in the first_orders
CTE, but I cleaned up by removing all the unused columns to have the
following:
WITH first_orders AS (
SELECT
RANK() OVER(PARTITION BY customer_id ORDER BY order_date) AS order_rank,
customer_pref_delivery_date = order_date AS immed
FROM delivery
)
SELECT ROUND(AVG(first_orders.immed) * 100,2) AS immediate_percentage
FROM first_orders
WHERE order_rank = 1;
FROM
statement)I thought to myself that I should be able to easily “convert” the CTE
into a subquery in the FROM
statement. So I did. But I
consider the query with CTE a bit better because it is a little more
readable.
SELECT
ROUND(
AVG(
order_date=customer_pref_delivery_date
) * 100, 2) AS immediate_percentage
FROM (
SELECT *,
RANK() OVER(PARTITION BY customer_id ORDER BY order_date) AS order_rank
FROM delivery
) AS d_ranked
WHERE order_rank = 1;
WHERE
statement)I also tried to think about whether there is another way to check
whether an order is the first order. So I came up with writing a
subquery in the WHERE
statement using the IN
function of SQL. The subquery returns a table of
customer_id
and the order_date
of the first
order. So the WHERE
statement filters the
delivery
table so that it will only have all the first
orders.
SELECT
ROUND(
AVG(order_date = customer_pref_delivery_date) *
100, 2) AS immediate_percentage
FROM delivery
WHERE (customer_id, order_date) IN (
SELECT
customer_id,
MIN(order_date) AS first_order_date
FROM delivery
GROUP BY customer_id
);
She uses CTE and a subquery in the SELECT
statement, but
her approach does not require any window function. She also calculates
the immediate_percentage
differently.
WITH first_order AS (
SELECT
customer_id
FROM Delivery
GROUP BY customer_id
HAVING MIN(order_date) =
MIN(customer_pref_delivery_date)
)
SELECT
ROUND(
CAST(COUNT(customer_id) AS DECIMAL) /
(
SELECT COUNT(DISTINCT customer_id)
FROM DELIVERY
) * 100,
2) AS immediate_percentage
FROM first_order;
You can see her video explaining the solution here.