Skip to content

3832. Find Users with Persistent Behavior Patterns

Description

Table: activity

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| user_id      | int     |
| action_date  | date    |
| action       | varchar |
+--------------+---------+
(user_id, action_date, action) is the primary key (unique value) for this table.
Each row represents a user performing a specific action on a given date.

Write a solution to identify behaviorally stable users based on the following definition:

  • A user is considered behaviorally stable if there exists a sequence of at least 5 consecutive days such that:
    • The user performed exactly one action per day during that period.
    • The action is the same on all those consecutive days.
  • If a user has multiple qualifying sequences, only consider the sequence with the maximum length.

Return the result table ordered by streak_length in descending order, then by user_id in ascending order.

The result format is in the following example.

Β 

Example:

Input:

activity table:

+---------+-------------+--------+
| user_id | action_date | action |
+---------+-------------+--------+
| 1       | 2024-01-01  | login  |
| 1       | 2024-01-02  | login  |
| 1       | 2024-01-03  | login  |
| 1       | 2024-01-04  | login  |
| 1       | 2024-01-05  | login  |
| 1       | 2024-01-06  | logout |
| 2       | 2024-01-01  | click  |
| 2       | 2024-01-02  | click  |
| 2       | 2024-01-03  | click  |
| 2       | 2024-01-04  | click  |
| 3       | 2024-01-01  | view   |
| 3       | 2024-01-02  | view   |
| 3       | 2024-01-03  | view   |
| 3       | 2024-01-04  | view   |
| 3       | 2024-01-05  | view   |
| 3       | 2024-01-06  | view   |
| 3       | 2024-01-07  | view   |
+---------+-------------+--------+

Output:

+---------+--------+---------------+------------+------------+
| user_id | action | streak_length | start_date | end_date   |
+---------+--------+---------------+------------+------------+
| 3       | view   | 7             | 2024-01-01 | 2024-01-07 |
| 1       | login  | 5             | 2024-01-01 | 2024-01-05 |
+---------+--------+---------------+------------+------------+

Explanation:

  • User 1:
    • Performed login from 2024-01-01 to 2024-01-05 on consecutive days
    • Each day has exactly one action, and the action is the same
    • Streak length = 5 (meets minimum requirement)
    • The action changes on 2024-01-06, ending the streak
  • User 2:
    • Performed click for only 4 consecutive days
    • Does not meet the minimum streak length of 5
    • Excluded from the result
  • User 3:
    • Performed view for 7 consecutive days
    • This is the longest valid sequence for this user
    • Included in the result

The Results table is ordered by streak_length in descending order, then by user_id in ascending order

Solutions

Solution 1: Filtering + Grouping + Aggregation

We first need to filter user dates with only a single action per day, then identify consecutive intervals among these dates, and finally aggregate these intervals to calculate the streak length and filter records that meet the criteria.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Write your MySQL query statement below
WITH
    daily_counts AS (
        -- Step 1: Filter user dates with exactly one record per day (meeting the requirement of "exactly one action per day")
        SELECT
            user_id,
            action_date,
            action,
            COUNT(*) OVER (PARTITION BY user_id, action_date) AS cnt
        FROM activity
    ),
    filtered_activity AS (
        -- Step 2: Filter out data with multiple actions on the same day
        SELECT user_id, action_date, action
        FROM daily_counts
        WHERE cnt = 1
    ),
    streak_groups AS (
        -- Step 3: Group consecutive dates using the method of subtracting row number from date
        SELECT
            user_id,
            action,
            action_date,
            DATE_SUB(
                action_date,
                INTERVAL ROW_NUMBER() OVER (
                    PARTITION BY user_id, action
                    ORDER BY action_date
                ) DAY
            ) AS grp
        FROM filtered_activity
    ),
    streak_summary AS (
        -- Step 4: Calculate the length of each consecutive segment and only keep records with length >= 5
        SELECT
            user_id,
            action,
            COUNT(*) AS streak_length,
            MIN(action_date) AS start_date,
            MAX(action_date) AS end_date,
            -- Sort different streaks for each user to facilitate getting the maximum value later
            ROW_NUMBER() OVER (
                PARTITION BY user_id
                ORDER BY COUNT(*) DESC
            ) AS rnk
        FROM streak_groups
        GROUP BY user_id, action, grp
        HAVING streak_length >= 5
    )
-- Step 5: Extract the longest record for each qualified user and sort
SELECT user_id, action, streak_length, start_date, end_date
FROM streak_summary
WHERE rnk = 1
ORDER BY streak_length DESC, user_id ASC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import pandas as pd

def find_behaviorally_stable_users(activity: pd.DataFrame) -> pd.DataFrame:
    activity['action_date'] = pd.to_datetime(activity['action_date'])

    # Filter users with only a single action per day
    df = activity.assign(cnt=activity.groupby(['user_id', 'action_date'])['action'].transform('count'))
    df = df[df['cnt'] == 1].sort_values(['user_id', 'action', 'action_date'])

    # Identify consecutive intervals
    df['rn'] = df.groupby(['user_id', 'action'])['action_date'].rank(method='first')
    df['grp'] = df['action_date'] - pd.to_timedelta(df['rn'], unit='D')

    # Aggregate streaks
    streaks = df.groupby(['user_id', 'action', 'grp']).agg(
        streak_length=('action_date', 'count'),
        start_date=('action_date', 'min'),
        end_date=('action_date', 'max')
    ).reset_index()

    # Filter and get the longest streak for each user
    res = streaks[streaks['streak_length'] >= 5].sort_values(
        ['streak_length', 'user_id'], ascending=[False, True]
    )

    return res.groupby('user_id').head(1)[['user_id', 'action', 'streak_length', 'start_date', 'end_date']]

Comments