Skip to content

3497. Analyze Subscription Conversion

Description

Table: UserActivity

+------------------+---------+
| Column Name      | Type    | 
+------------------+---------+
| user_id          | int     |
| activity_date    | date    |
| activity_type    | varchar |
| activity_duration| int     |
+------------------+---------+
(user_id, activity_date, activity_type) is the unique key for this table.
activity_type is one of ('free_trial', 'paid', 'cancelled').
activity_duration is the number of minutes the user spent on the platform that day.
Each row represents a user's activity on a specific date.

A subscription service wants to analyze user behavior patterns. The company offers a 7-day free trial, after which users can subscribe to a paid plan or cancel. Write a solution to:

  1. Find users who converted from free trial to paid subscription
  2. Calculate each user's average daily activity duration during their free trial period (rounded to 2 decimal places)
  3. Calculate each user's average daily activity duration during their paid subscription period (rounded to 2 decimal places)

Return the result table ordered by user_id in ascending order.

The result format is in the following example.

 

Example:

Input:

UserActivity table:

+---------+---------------+---------------+-------------------+
| user_id | activity_date | activity_type | activity_duration |
+---------+---------------+---------------+-------------------+
| 1       | 2023-01-01    | free_trial    | 45                |
| 1       | 2023-01-02    | free_trial    | 30                |
| 1       | 2023-01-05    | free_trial    | 60                |
| 1       | 2023-01-10    | paid          | 75                |
| 1       | 2023-01-12    | paid          | 90                |
| 1       | 2023-01-15    | paid          | 65                |
| 2       | 2023-02-01    | free_trial    | 55                |
| 2       | 2023-02-03    | free_trial    | 25                |
| 2       | 2023-02-07    | free_trial    | 50                |
| 2       | 2023-02-10    | cancelled     | 0                 |
| 3       | 2023-03-05    | free_trial    | 70                |
| 3       | 2023-03-06    | free_trial    | 60                |
| 3       | 2023-03-08    | free_trial    | 80                |
| 3       | 2023-03-12    | paid          | 50                |
| 3       | 2023-03-15    | paid          | 55                |
| 3       | 2023-03-20    | paid          | 85                |
| 4       | 2023-04-01    | free_trial    | 40                |
| 4       | 2023-04-03    | free_trial    | 35                |
| 4       | 2023-04-05    | paid          | 45                |
| 4       | 2023-04-07    | cancelled     | 0                 |
+---------+---------------+---------------+-------------------+

Output:

+---------+--------------------+-------------------+
| user_id | trial_avg_duration | paid_avg_duration |
+---------+--------------------+-------------------+
| 1       | 45.00              | 76.67             |
| 3       | 70.00              | 63.33             |
| 4       | 37.50              | 45.00             |
+---------+--------------------+-------------------+

Explanation:

  • User 1:
    • Had 3 days of free trial with durations of 45, 30, and 60 minutes.
    • Average trial duration: (45 + 30 + 60) / 3 = 45.00 minutes.
    • Had 3 days of paid subscription with durations of 75, 90, and 65 minutes.
    • Average paid duration: (75 + 90 + 65) / 3 = 76.67 minutes.
  • User 2:
    • Had 3 days of free trial with durations of 55, 25, and 50 minutes.
    • Average trial duration: (55 + 25 + 50) / 3 = 43.33 minutes.
    • Did not convert to a paid subscription (only had free_trial and cancelled activities).
    • Not included in the output because they didn't convert to paid.
  • User 3:
    • Had 3 days of free trial with durations of 70, 60, and 80 minutes.
    • Average trial duration: (70 + 60 + 80) / 3 = 70.00 minutes.
    • Had 3 days of paid subscription with durations of 50, 55, and 85 minutes.
    • Average paid duration: (50 + 55 + 85) / 3 = 63.33 minutes.
  • User 4:
    • Had 2 days of free trial with durations of 40 and 35 minutes.
    • Average trial duration: (40 + 35) / 2 = 37.50 minutes.
    • Had 1 day of paid subscription with duration of 45 minutes before cancelling.
    • Average paid duration: 45.00 minutes.

The result table only includes users who converted from free trial to paid subscription (users 1, 3, and 4), and is ordered by user_id in ascending order.

Solutions

Solution 1: Grouping + Conditional Filtering + Equi-Join

First, we filter the data in the table to exclude all records where activity_type is equal to cancelled. Then, we group the remaining data by user_id and activity_type, calculate the duration duration for each group, and store the results in table T.

Next, we filter table T to extract records where activity_type is free_trial and paid, storing them in tables F and P, respectively. Finally, we perform an equi-join on these two tables using user_id, filter the required fields as per the problem statement, and sort the results to produce the final output.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Write your MySQL query statement below
WITH
    T AS (
        SELECT user_id, activity_type, ROUND(SUM(activity_duration) / COUNT(1), 2) duration
        FROM UserActivity
        WHERE activity_type != 'cancelled'
        GROUP BY user_id, activity_type
    ),
    F AS (
        SELECT user_id, duration trial_avg_duration
        FROM T
        WHERE activity_type = 'free_trial'
    ),
    P AS (
        SELECT user_id, duration paid_avg_duration
        FROM T
        WHERE activity_type = 'paid'
    )
SELECT user_id, trial_avg_duration, paid_avg_duration
FROM
    F
    JOIN P USING (user_id)
ORDER BY 1;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import pandas as pd


def analyze_subscription_conversion(user_activity: pd.DataFrame) -> pd.DataFrame:
    df = user_activity[user_activity["activity_type"] != "cancelled"]

    df_grouped = (
        df.groupby(["user_id", "activity_type"])["activity_duration"]
        .mean()
        .add(0.0001)
        .round(2)
        .reset_index()
    )

    df_free_trial = (
        df_grouped[df_grouped["activity_type"] == "free_trial"]
        .rename(columns={"activity_duration": "trial_avg_duration"})
        .drop(columns=["activity_type"])
    )

    df_paid = (
        df_grouped[df_grouped["activity_type"] == "paid"]
        .rename(columns={"activity_duration": "paid_avg_duration"})
        .drop(columns=["activity_type"])
    )

    result = df_free_trial.merge(df_paid, on="user_id", how="inner").sort_values(
        "user_id"
    )

    return result

Comments