跳转至

3642. Find Books with Polarized Opinions

题目描述

Table: books

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| book_id     | int     |
| title       | varchar |
| author      | varchar |
| genre       | varchar |
| pages       | int     |
+-------------+---------+
book_id is the unique ID for this table.
Each row contains information about a book including its genre and page count.

Table: reading_sessions

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| session_id     | int     |
| book_id        | int     |
| reader_name    | varchar |
| pages_read     | int     |
| session_rating | int     |
+----------------+---------+
session_id is the unique ID for this table.
Each row represents a reading session where someone read a portion of a book. session_rating is on a scale of 1-5.

Write a solution to find books that have polarized opinions - books that receive both very high ratings and very low ratings from different readers.

  • A book has polarized opinions if it has at least one rating ≥ 4 and at least one rating ≤ 2
  • Only consider books that have at least 5 reading sessions
  • Calculate the rating spread as (highest_rating - lowest_rating)
  • Calculate the polarization score as the number of extreme ratings (ratings ≤ 2 or ≥ 4) divided by total sessions
  • Only include books where polarization score ≥ 0.6 (at least 60% extreme ratings)

Return the result table ordered by polarization score in descending order, then by title in descending order.

The result format is in the following example.

 

Example:

Input:

books table:

+---------+------------------------+---------------+----------+-------+
| book_id | title                  | author        | genre    | pages |
+---------+------------------------+---------------+----------+-------+
| 1       | The Great Gatsby       | F. Scott      | Fiction  | 180   |
| 2       | To Kill a Mockingbird  | Harper Lee    | Fiction  | 281   |
| 3       | 1984                   | George Orwell | Dystopian| 328   |
| 4       | Pride and Prejudice    | Jane Austen   | Romance  | 432   |
| 5       | The Catcher in the Rye | J.D. Salinger | Fiction  | 277   |
+---------+------------------------+---------------+----------+-------+

reading_sessions table:

+------------+---------+-------------+------------+----------------+
| session_id | book_id | reader_name | pages_read | session_rating |
+------------+---------+-------------+------------+----------------+
| 1          | 1       | Alice       | 50         | 5              |
| 2          | 1       | Bob         | 60         | 1              |
| 3          | 1       | Carol       | 40         | 4              |
| 4          | 1       | David       | 30         | 2              |
| 5          | 1       | Emma        | 45         | 5              |
| 6          | 2       | Frank       | 80         | 4              |
| 7          | 2       | Grace       | 70         | 4              |
| 8          | 2       | Henry       | 90         | 5              |
| 9          | 2       | Ivy         | 60         | 4              |
| 10         | 2       | Jack        | 75         | 4              |
| 11         | 3       | Kate        | 100        | 2              |
| 12         | 3       | Liam        | 120        | 1              |
| 13         | 3       | Mia         | 80         | 2              |
| 14         | 3       | Noah        | 90         | 1              |
| 15         | 3       | Olivia      | 110        | 4              |
| 16         | 3       | Paul        | 95         | 5              |
| 17         | 4       | Quinn       | 150        | 3              |
| 18         | 4       | Ruby        | 140        | 3              |
| 19         | 5       | Sam         | 80         | 1              |
| 20         | 5       | Tara        | 70         | 2              |
+------------+---------+-------------+------------+----------------+

Output:

+---------+------------------+---------------+-----------+-------+---------------+--------------------+
| book_id | title            | author        | genre     | pages | rating_spread | polarization_score |
+---------+------------------+---------------+-----------+-------+---------------+--------------------+
| 1       | The Great Gatsby | F. Scott      | Fiction   | 180   | 4             | 1.00               |
| 3       | 1984             | George Orwell | Dystopian | 328   | 4             | 1.00               |
+---------+------------------+---------------+-----------+-------+---------------+--------------------+

Explanation:

  • The Great Gatsby (book_id = 1):
    • Has 5 reading sessions (meets minimum requirement)
    • Ratings: 5, 1, 4, 2, 5
    • Has ratings ≥ 4: 5, 4, 5 (3 sessions)
    • Has ratings ≤ 2: 1, 2 (2 sessions)
    • Rating spread: 5 - 1 = 4
    • Extreme ratings (≤2 or ≥4): All 5 sessions (5, 1, 4, 2, 5)
    • Polarization score: 5/5 = 1.00 (≥ 0.6, qualifies)
  • 1984 (book_id = 3):
    • Has 6 reading sessions (meets minimum requirement)
    • Ratings: 2, 1, 2, 1, 4, 5
    • Has ratings ≥ 4: 4, 5 (2 sessions)
    • Has ratings ≤ 2: 2, 1, 2, 1 (4 sessions)
    • Rating spread: 5 - 1 = 4
    • Extreme ratings (≤2 or ≥4): All 6 sessions (2, 1, 2, 1, 4, 5)
    • Polarization score: 6/6 = 1.00 (≥ 0.6, qualifies)
  • Books not included:
    • To Kill a Mockingbird (book_id = 2): All ratings are 4-5, no low ratings (≤2)
    • Pride and Prejudice (book_id = 4): Only 2 sessions (< 5 minimum)
    • The Catcher in the Rye (book_id = 5): Only 2 sessions (< 5 minimum)

The result table is ordered by polarization score in descending order, then by book title in descending order.

解法

方法一:连接 + 分组聚合

我们可以通过连接 books 表和 reading_sessions 表,然后对结果进行分组和聚合来实现。

首先,我们需要计算每本书的评分范围、极端评分的数量和极端评分的比例。

然后,我们可以根据这些指标筛选出符合条件的书籍。

最后,按照极端评分比例和书名的降序排列结果。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Write your MySQL query statement below
SELECT
    book_id,
    title,
    author,
    genre,
    pages,
    (MAX(session_rating) - MIN(session_rating)) AS rating_spread,
    ROUND((SUM(session_rating <= 2) + SUM(session_rating >= 4)) / COUNT(1), 2) polarization_score
FROM
    books
    JOIN reading_sessions USING (book_id)
GROUP BY book_id
HAVING
    COUNT(1) >= 5
    AND MAX(session_rating) >= 4
    AND MIN(session_rating) <= 2
    AND polarization_score >= 0.6
ORDER BY polarization_score DESC, title DESC;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import pandas as pd
from decimal import Decimal, ROUND_HALF_UP


def find_polarized_books(
    books: pd.DataFrame, reading_sessions: pd.DataFrame
) -> pd.DataFrame:
    df = books.merge(reading_sessions, on="book_id")
    agg_df = (
        df.groupby(["book_id", "title", "author", "genre", "pages"])
        .agg(
            max_rating=("session_rating", "max"),
            min_rating=("session_rating", "min"),
            rating_spread=("session_rating", lambda x: x.max() - x.min()),
            count_sessions=("session_rating", "count"),
            low_or_high_count=("session_rating", lambda x: ((x <= 2) | (x >= 4)).sum()),
        )
        .reset_index()
    )

    agg_df["polarization_score"] = agg_df.apply(
        lambda r: float(
            Decimal(r["low_or_high_count"] / r["count_sessions"]).quantize(
                Decimal("0.01"), rounding=ROUND_HALF_UP
            )
        ),
        axis=1,
    )

    result = agg_df[
        (agg_df["count_sessions"] >= 5)
        & (agg_df["max_rating"] >= 4)
        & (agg_df["min_rating"] <= 2)
        & (agg_df["polarization_score"] >= 0.6)
    ]

    return result.sort_values(
        by=["polarization_score", "title"], ascending=[False, False]
    )[
        [
            "book_id",
            "title",
            "author",
            "genre",
            "pages",
            "rating_spread",
            "polarization_score",
        ]
    ]

评论