3793. Find Users with High Token Usage

Description

Table: prompts

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| user_id     | int     |
| prompt      | varchar |
| tokens      | int     |
+-------------+---------+
(user_id, prompt) is the primary key (unique value) for this table.
Each row represents a prompt submitted by a user to an AI system along with the number of tokens consumed.

Write a solution to analyze AI prompt usage patterns based on the following requirements:

For each user, calculate the total number of prompts they have submitted.
For each user, calculate the average tokens used per prompt (Rounded to 2 decimal places).
Only include users who have submitted at least 3 prompts.
Only include users who have submitted at least one prompt with tokens greater than their own average token usage.

Return the result table ordered by average tokens in descending order, and then by user_id in ascending order.

The result format is in the following example.

Example:

Input:

prompts table:

+---------+--------------------------+--------+
| user_id | prompt                   | tokens |
+---------+--------------------------+--------+
| 1       | Write a blog outline     | 120    |
| 1       | Generate SQL query       | 80     |
| 1       | Summarize an article     | 200    |
| 2       | Create resume bullet     | 60     |
| 2       | Improve LinkedIn bio     | 70     |
| 3       | Explain neural networks  | 300    |
| 3       | Generate interview Q&A   | 250    |
| 3       | Write cover letter       | 180    |
| 3       | Optimize Python code     | 220    |
+---------+--------------------------+--------+

Output:

+---------+---------------+------------+
| user_id | prompt_count  | avg_tokens |
+---------+---------------+------------+
| 3       | 4             | 237.5      |
| 1       | 3             | 133.33     |
+---------+---------------+------------+

Explanation:

User 1:
- Total prompts = 3
- Average tokens = (120 + 80 + 200) / 3 = 133.33
- Has a prompt with 200 tokens, which is greater than the average
- Included in the result
User 2:
- Total prompts = 2 (less than the required minimum)
- Excluded from the result
User 3:
- Total prompts = 4
- Average tokens = (300 + 250 + 180 + 220) / 4 = 237.5
- Has prompts with 300 and 250 tokens, both greater than the average
- Included in the result

The Results table is ordered by avg_tokens in descending order, then by user_id in ascending order

Solutions

Solution 1: Group By Statistics

We first group the prompts by user_id and calculate for each user the total number of prompts prompt_count, the average tokens avg_tokens, and the maximum tokens max_tokens. Then we filter users who meet the criteria, i.e., those who have submitted at least 3 prompts and have at least one prompt with tokens greater than their average token usage. Finally, we sort the results by avg_tokens in descending order and by user_id in ascending order.

MySQLPandas

# Write your MySQL query statement below
SELECT
    user_id,
    COUNT(1) AS prompt_count,
    ROUND(AVG(tokens), 2) AS avg_tokens
FROM prompts
GROUP BY user_id
HAVING prompt_count >= 3 AND MAX(tokens) > avg_tokens
ORDER BY avg_tokens DESC, user_id;

import pandas as pd


def find_users_with_high_tokens(prompts: pd.DataFrame) -> pd.DataFrame:
    df = prompts.groupby("user_id", as_index=False).agg(
        prompt_count=("user_id", "size"),
        avg_tokens=("tokens", "mean"),
        max_tokens=("tokens", "max"),
    )

    df["avg_tokens"] = df["avg_tokens"].round(2)

    df = df[(df["prompt_count"] >= 3) & (df["max_tokens"] > df["avg_tokens"])]

    df = (
        df.sort_values(["avg_tokens", "user_id"], ascending=[False, True])
        .loc[:, ["user_id", "prompt_count", "avg_tokens"]]
        .reset_index(drop=True)
    )

    return df

3793. Find Users with High Token Usage

Description

Solutions

Solution 1: Group By Statistics

Comments