跳转至

3793. Find Users with High Token Usage

题目描述

Table: prompts

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| user_id     | int     |
| prompt      | varchar |
| tokens      | int     |
+-------------+---------+
(user_id, prompt) is the primary key (unique value) for this table.
Each row represents a prompt submitted by a user to an AI system along with the number of tokens consumed.

Write a solution to analyze AI prompt usage patterns based on the following requirements:

  • For each user, calculate the total number of prompts they have submitted.
  • For each user, calculate the average tokens used per prompt (Rounded to 2 decimal places).
  • Only include users who have submitted at least 3 prompts.
  • Only include users who have submitted at least one prompt with tokens greater than their own average token usage.

Return the result table ordered by average tokens in descending order, and then by user_id in ascending order.

The result format is in the following example.

 

Example:

Input:

prompts table:

+---------+--------------------------+--------+
| user_id | prompt                   | tokens |
+---------+--------------------------+--------+
| 1       | Write a blog outline     | 120    |
| 1       | Generate SQL query       | 80     |
| 1       | Summarize an article     | 200    |
| 2       | Create resume bullet     | 60     |
| 2       | Improve LinkedIn bio     | 70     |
| 3       | Explain neural networks  | 300    |
| 3       | Generate interview Q&A   | 250    |
| 3       | Write cover letter       | 180    |
| 3       | Optimize Python code     | 220    |
+---------+--------------------------+--------+

Output:

+---------+---------------+------------+
| user_id | prompt_count  | avg_tokens |
+---------+---------------+------------+
| 3       | 4             | 237.5      |
| 1       | 3             | 133.33     |
+---------+---------------+------------+

Explanation:

  • User 1:
    • Total prompts = 3
    • Average tokens = (120 + 80 + 200) / 3 = 133.33
    • Has a prompt with 200 tokens, which is greater than the average
    • Included in the result
  • User 2:
    • Total prompts = 2 (less than the required minimum)
    • Excluded from the result
  • User 3:
    • Total prompts = 4
    • Average tokens = (300 + 250 + 180 + 220) / 4 = 237.5
    • Has prompts with 300 and 250 tokens, both greater than the average
    • Included in the result

The Results table is ordered by avg_tokens in descending order, then by user_id in ascending order

解法

方法一:分组统计

我们首先将数组按照 user_id 进行分组统计,计算每个用户的提示词数量 prompt_count、平均令牌数 avg_tokens 以及最大令牌数 max_tokens。然后筛选出满足条件的用户,即提示词数量不少于 3 且存在提示词的令牌数大于平均令牌数的用户。最后按照平均令牌数降序和用户 ID 升序排序输出结果。

1
2
3
4
5
6
7
8
9
# Write your MySQL query statement below
SELECT
    user_id,
    COUNT(1) AS prompt_count,
    ROUND(AVG(tokens), 2) AS avg_tokens
FROM prompts
GROUP BY user_id
HAVING prompt_count >= 3 AND MAX(tokens) > avg_tokens
ORDER BY avg_tokens DESC, user_id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import pandas as pd


def find_users_with_high_tokens(prompts: pd.DataFrame) -> pd.DataFrame:
    df = prompts.groupby("user_id", as_index=False).agg(
        prompt_count=("user_id", "size"),
        avg_tokens=("tokens", "mean"),
        max_tokens=("tokens", "max"),
    )

    df["avg_tokens"] = df["avg_tokens"].round(2)

    df = df[(df["prompt_count"] >= 3) & (df["max_tokens"] > df["avg_tokens"])]

    df = (
        df.sort_values(["avg_tokens", "user_id"], ascending=[False, True])
        .loc[:, ["user_id", "prompt_count", "avg_tokens"]]
        .reset_index(drop=True)
    )

    return df

评论