跳转至

3642. 查找有两极分化观点的书籍

题目描述

表:books

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| book_id     | int     |
| title       | varchar |
| author      | varchar |
| genre       | varchar |
| pages       | int     |
+-------------+---------+
book_id 是这张表的唯一主键。
每一行包含关于一本书的信息,包括其类型和页数。

表:reading_sessions

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| session_id     | int     |
| book_id        | int     |
| reader_name    | varchar |
| pages_read     | int     |
| session_rating | int     |
+----------------+---------+
session_id 是这张表的唯一主键。
每一行代表一次阅读事件,有人阅读了书籍的一部分。session_rating 在 1-5 的范围内。

编写一个解决方案来找到具有 两极分化观点 的书 - 同时获得不同读者极高和极低评分的书籍。

  • 如果一本书有至少一个大于等于 4 的评分和至少一个小于等于 2 的评分则是有两极分化观点的书
  • 只考虑有至少 5 次阅读事件的书籍
  • 按 highest_rating - lowest_rating 计算评分差幅 rating spread
  • 按极端评分(评分小于等于 2 或大于等于 4)的数量除以总阅读事件计算 极化得分 polarization score
  • 只包含 极化得分大于等于 0.6 的书(至少 60% 极端评分)

返回结果表按极化得分 降序 排序,然后按标题 降序 排序。

返回格式如下所示。

 

示例:

输入:

books 表:

+---------+------------------------+---------------+----------+-------+
| book_id | title                  | author        | genre    | pages |
+---------+------------------------+---------------+----------+-------+
| 1       | The Great Gatsby       | F. Scott      | Fiction  | 180   |
| 2       | To Kill a Mockingbird  | Harper Lee    | Fiction  | 281   |
| 3       | 1984                   | George Orwell | Dystopian| 328   |
| 4       | Pride and Prejudice    | Jane Austen   | Romance  | 432   |
| 5       | The Catcher in the Rye | J.D. Salinger | Fiction  | 277   |
+---------+------------------------+---------------+----------+-------+

reading_sessions 表:

+------------+---------+-------------+------------+----------------+
| session_id | book_id | reader_name | pages_read | session_rating |
+------------+---------+-------------+------------+----------------+
| 1          | 1       | Alice       | 50         | 5              |
| 2          | 1       | Bob         | 60         | 1              |
| 3          | 1       | Carol       | 40         | 4              |
| 4          | 1       | David       | 30         | 2              |
| 5          | 1       | Emma        | 45         | 5              |
| 6          | 2       | Frank       | 80         | 4              |
| 7          | 2       | Grace       | 70         | 4              |
| 8          | 2       | Henry       | 90         | 5              |
| 9          | 2       | Ivy         | 60         | 4              |
| 10         | 2       | Jack        | 75         | 4              |
| 11         | 3       | Kate        | 100        | 2              |
| 12         | 3       | Liam        | 120        | 1              |
| 13         | 3       | Mia         | 80         | 2              |
| 14         | 3       | Noah        | 90         | 1              |
| 15         | 3       | Olivia      | 110        | 4              |
| 16         | 3       | Paul        | 95         | 5              |
| 17         | 4       | Quinn       | 150        | 3              |
| 18         | 4       | Ruby        | 140        | 3              |
| 19         | 5       | Sam         | 80         | 1              |
| 20         | 5       | Tara        | 70         | 2              |
+------------+---------+-------------+------------+----------------+

输出:

+---------+------------------+---------------+-----------+-------+---------------+--------------------+
| book_id | title            | author        | genre     | pages | rating_spread | polarization_score |
+---------+------------------+---------------+-----------+-------+---------------+--------------------+
| 1       | The Great Gatsby | F. Scott      | Fiction   | 180   | 4             | 1.00               |
| 3       | 1984             | George Orwell | Dystopian | 328   | 4             | 1.00               |
+---------+------------------+---------------+-----------+-------+---------------+--------------------+

解释:

  • 了不起的盖茨比(book_id = 1):
    • 有 5 次阅读事件(满足最少要求)
    • 评分:5, 1, 4, 2, 5
    • 大于等于 4 的评分:5,4,5(3 次事件)
    • 小于等于 2 的评分:1,2(2 次事件)
    • 评分差:5 - 1 = 4
    • 极端评分(≤2 或 ≥4):所有 5 次事件(5,1,4,2,5)
    • 极化得分:5/5 = 1.00(≥ 0.6,符合)
  • 1984 (book_id = 3):
    • 有 6 次阅读事件(满足最少要求)
    • 评分:2,1,2,1,4,5
    • 大于等于 4 的评分:4,5(2 次事件)
    • 小于等于 2 的评分:2,1,2,1(4 次事件)
    • 评分差:5 - 1 = 4
    • 极端评分(≤2 或 ≥4):所有 6 次事件(2,1,2,1,4,5)
    • 极化得分:6/6 = 1.00 (≥ 0.6,符合)
  • 未包含的书:
    • 杀死一只知更鸟(book_id = 2):所有评分为 4-5,没有低分(≤2)
    • 傲慢与偏见(book_id = 4):只有 2 次事件(< 最少 5 次)
    • 麦田里的守望者(book_id = 5):只有 2 次事件(< 最少 5 次)

结果表按极化得分降序排序,然后按标题降序排序。

解法

方法一:连接 + 分组聚合

我们可以通过连接 books 表和 reading_sessions 表,然后对结果进行分组和聚合来实现。

首先,我们需要计算每本书的评分范围、极端评分的数量和极端评分的比例。

然后,我们可以根据这些指标筛选出符合条件的书籍。

最后,按照极端评分比例和书名的降序排列结果。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Write your MySQL query statement below
SELECT
    book_id,
    title,
    author,
    genre,
    pages,
    (MAX(session_rating) - MIN(session_rating)) AS rating_spread,
    ROUND((SUM(session_rating <= 2) + SUM(session_rating >= 4)) / COUNT(1), 2) polarization_score
FROM
    books
    JOIN reading_sessions USING (book_id)
GROUP BY book_id
HAVING
    COUNT(1) >= 5
    AND MAX(session_rating) >= 4
    AND MIN(session_rating) <= 2
    AND polarization_score >= 0.6
ORDER BY polarization_score DESC, title DESC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import pandas as pd
from decimal import Decimal, ROUND_HALF_UP


def find_polarized_books(
    books: pd.DataFrame, reading_sessions: pd.DataFrame
) -> pd.DataFrame:
    df = books.merge(reading_sessions, on="book_id")
    agg_df = (
        df.groupby(["book_id", "title", "author", "genre", "pages"])
        .agg(
            max_rating=("session_rating", "max"),
            min_rating=("session_rating", "min"),
            rating_spread=("session_rating", lambda x: x.max() - x.min()),
            count_sessions=("session_rating", "count"),
            low_or_high_count=("session_rating", lambda x: ((x <= 2) | (x >= 4)).sum()),
        )
        .reset_index()
    )

    agg_df["polarization_score"] = agg_df.apply(
        lambda r: float(
            Decimal(r["low_or_high_count"] / r["count_sessions"]).quantize(
                Decimal("0.01"), rounding=ROUND_HALF_UP
            )
        ),
        axis=1,
    )

    result = agg_df[
        (agg_df["count_sessions"] >= 5)
        & (agg_df["max_rating"] >= 4)
        & (agg_df["min_rating"] <= 2)
        & (agg_df["polarization_score"] >= 0.6)
    ]

    return result.sort_values(
        by=["polarization_score", "title"], ascending=[False, False]
    )[
        [
            "book_id",
            "title",
            "author",
            "genre",
            "pages",
            "rating_spread",
            "polarization_score",
        ]
    ]

评论