跳转至

3808. Find Emotionally Consistent Users

题目描述

Table: reactions

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| user_id      | int     |
| content_id   | int     |
| reaction     | varchar |
+--------------+---------+
(user_id, content_id) is the primary key (unique value) for this table.
Each row represents a reaction given by a user to a piece of content.

Write a solution to identify emotionally consistent users based on the following requirements:

  • For each user, count the total number of reactions they have given.
  • Only include users who have reacted to at least 5 different content items.
  • A user is considered emotionally consistent if at least 60% of their reactions are of the same type.

Return the result table ordered by reaction_ratio in descending order and then by user_id in ascending order.

Note:

  • reaction_ratio should be rounded to 2 decimal places

The result format is in the following example.

 

Example:

Input:

reactions table:

+---------+------------+----------+
| user_id | content_id | reaction |
+---------+------------+----------+
| 1       | 101        | like     |
| 1       | 102        | like     |
| 1       | 103        | like     |
| 1       | 104        | wow      |
| 1       | 105        | like     |
| 2       | 201        | like     |
| 2       | 202        | wow      |
| 2       | 203        | sad      |
| 2       | 204        | like     |
| 2       | 205        | wow      |
| 3       | 301        | love     |
| 3       | 302        | love     |
| 3       | 303        | love     |
| 3       | 304        | love     |
| 3       | 305        | love     |
+---------+------------+----------+

Output:

+---------+-------------------+----------------+
| user_id | dominant_reaction | reaction_ratio |
+---------+-------------------+----------------+
| 3       | love              | 1.00           |
| 1       | like              | 0.80           |
+---------+-------------------+----------------+

Explanation:

  • User 1:
    • Total reactions = 5
    • like appears 4 times
    • reaction_ratio = 4 / 5 = 0.80
    • Meets the 60% consistency requirement
  • User 2:
    • Total reactions = 5
    • Most frequent reaction appears only 2 times
    • reaction_ratio = 2 / 5 = 0.40
    • Does not meet the consistency requirement
  • User 3:
    • Total reactions = 5
    • 'love' appears 5 times
    • reaction_ratio = 5 / 5 = 1.00
    • Meets the consistency requirement

The Results table is ordered by reaction_ratio in descending order, then by user_id in ascending order.

解法

方法一:分组统计 + 连接查询

我们首先统计每个用户对每种反应的计数,记录在临时表 \(t\) 中。然后,我们在临时表 \(t\) 的基础上,统计每个用户的最大反应计数和总反应计数,计算出反应比例,并筛选出满足条件的用户,记录在临时表 \(s\) 中。最后,我们将临时表 \(s\)\(t\) 进行连接查询,找出每个用户的主导反应,并按要求排序输出结果。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Write your MySQL query statement below
WITH
    t AS (
        SELECT
            user_id,
            reaction,
            COUNT(1) cnt
        FROM reactions
        GROUP BY 1, 2
    ),
    s AS (
        SELECT
            user_id,
            MAX(cnt) mx_cnt,
            ROUND(MAX(cnt) / SUM(cnt), 2) reaction_ratio
        FROM t
        GROUP BY 1
        HAVING reaction_ratio >= 0.60 AND SUM(cnt) >= 5
    )
SELECT user_id, reaction dominant_reaction, reaction_ratio
FROM
    s
    JOIN t USING (user_id)
WHERE cnt = mx_cnt
ORDER BY 3 DESC, 1;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import pandas as pd
from decimal import Decimal, ROUND_HALF_UP


def find_emotionally_consistent_users(reactions: pd.DataFrame) -> pd.DataFrame:
    t = reactions.groupby(["user_id", "reaction"]).size().reset_index(name="cnt")

    s = (
        t.groupby("user_id")
        .agg(mx_cnt=("cnt", "max"), total_cnt=("cnt", "sum"))
        .reset_index()
    )

    s["reaction_ratio"] = (
        s["mx_cnt"]
        .div(s["total_cnt"])
        .apply(
            lambda x: float(
                Decimal(str(x)).quantize(Decimal("0.00"), rounding=ROUND_HALF_UP)
            )
        )
    )

    s = s[(s["reaction_ratio"] >= 0.60) & (s["total_cnt"] >= 5)]

    merged = pd.merge(
        s[["user_id", "mx_cnt", "reaction_ratio"]],
        t,
        left_on=["user_id", "mx_cnt"],
        right_on=["user_id", "cnt"],
    )

    result = (
        merged[["user_id", "reaction", "reaction_ratio"]]
        .rename(columns={"reaction": "dominant_reaction"})
        .sort_values(by=["reaction_ratio", "user_id"], ascending=[False, True])
        .reset_index(drop=True)
    )

    return result

评论