相關實體的獨立聚合，同時也按相關排序（在單個語句中）

October 18, 2020

我有一個模型，books它們authors之間有一個多對多的關係（因為一本書可以有多個作者，一個作者可以寫多本書）通過我命名的表authorships。
我的目標是查詢書籍的某個子集，並獲取相關作者的集合，以及相關作者的集合（即沒有重複），每個都以特定的順序排列。本質上，我想保留記錄的規範化/分離結構，我不想以任何方式去規範化（只是排序）。
通常，我認為您會使用多個語句來執行此操作，使用函式或外部程式碼為IN表達式或類似內容提供 ID。但是，我已經能夠在 PostgreSQL 中使用以下模式在單個語句中執行此操作：
WITH matched_books AS (
 SELECT id, title FROM books
 -- Could be any criteria:
 WHERE title LIKE 'The %'
),
related_authorships AS (
 SELECT authorships.id, book_id, author_id
 FROM authorships
 JOIN matched_books ON book_id = matched_books.id
),
related_authors AS (
 SELECT id, name
 FROM authors
 -- Could also use DISTINCT and do a join here, but I understand EXISTS is typically better for performance:
 WHERE EXISTS (SELECT 1 FROM related_authorships WHERE author_id = authors.id)
)
SELECT
 -- Scalar subqueries that each return a single JSON array of objects:
 -- JSON is completely fine for my purposes, but could also use array_agg.
 (SELECT json_agg(matched_books.* ORDER BY title) FROM matched_books) books,
 (SELECT json_agg(related_authorships.* ORDER BY id) FROM related_authorships) authorships,
 (SELECT json_agg(related_authors.* ORDER BY name) FROM related_authors) authors;
（旁注：在以前的嘗試中，我曾LEFT JOIN在頂層使用 s 和json_agg(DISTINCT ...)，但這讓我無法ORDER BY有意義地使用，而且性能似乎更混亂/更差。）
雖然這種方法幾乎效果很好，但我現在想訂購books儲存在其相關authors和/或authorships. 作為一個明顯的例子，假設我希望它們按name作者的作者排序，或者如果有多個作者，則使用作者身份中的一列（id在這種情況下它可以只是最小的整數）來確定應該使用哪個作者.
我想不出一種方法可以允許這樣做，同時仍然獨立地返回集合，至少在沒有一些重複操作的情況下不會。你會如何解決這個問題？

我不確定我是否正確理解了您，但可能您需要將作者的其他欄位分組，以便只有一個 ORDER 值。它可能是 athours 的計數，或者 name 的 min / max 左右：

WITH books AS (
 SELECT 1 AS id, 'The 1' AS title
  UNION ALL
 SELECT 2, 'The 2'
  UNION ALL
 SELECT 3, '3'
), authorships AS (
 SELECT 1 AS id, 1 AS book_id, 1 AS author_id
  UNION ALL
 SELECT 2, 1, 2
  UNION ALL
 SELECT 3, 1, 3
  UNION ALL
 SELECT 4, 2, 1
  UNION ALL
 SELECT 5, 3, 1
), authors AS (
 SELECT 1 AS id, 'name1' AS name
  UNION ALL
 SELECT 2, 'name2'
  UNION ALL
 SELECT 3, 'name3'
  UNION ALL
 SELECT 4, 'name4'
), filtered AS (
 SELECT book_id, title, ba.id, author_id, name
   FROM books AS b
   JOIN authorships AS ba ON ba.book_id = b.id
   JOIN authors AS a ON a.id = ba.author_id WHERE title LIKE 'The %'
) 
SELECT (
        SELECT json_agg(b.* ORDER BY count)
          FROM (
                 SELECT book_id AS id, title, count(name)
                   FROM filtered AS f GROUP BY 1,2
               ) AS f
          JOIN LATERAL (SELECT id, title) AS b ON true)
      ) AS book 

          json_agg           
-----------------------------
[{"id":2,"title":"The 2"}, +
 {"id":1,"title":"The 1"}]
(1 row)

因此，經過多次反複試驗，我能夠提出兩種以可接受的速度工作的方法。

使用問題範例中使用的方法，我確定不幸的是沒有辦法重複連接：

WITH matched_books AS (
 SELECT id, title FROM books
 WHERE title LIKE 'The %'
),
related_authorships AS (
 SELECT authorships.id, book_id, author_id
 FROM authorships
 JOIN matched_books ON book_id = matched_books.id
),
related_authors AS (
 SELECT id, name
 FROM authors
 WHERE EXISTS (SELECT 1 FROM related_authorships WHERE author_id = authors.id)
)
SELECT
 (SELECT json_agg(matched_books.* ORDER BY first_author_name)
   FROM matched_books
   LEFT JOIN (
     SELECT DISTINCT ON (book_id) book_id, name AS first_author_name
     FROM related_authorships
     LEFT JOIN related_authors ON author_id = related_authors.id
     ORDER BY book_id, related_authorships.id
   ) sub ON book_id = matched_books.id
 ) books,
 (SELECT json_agg(related_authorships.* ORDER BY id) FROM related_authorships) authorships,
 (SELECT json_agg(related_authors.* ORDER BY name) FROM related_authors) authors;

EXPLAIN計劃如下所示：

Result  (cost=1344.93..1344.94 rows=1 width=96)
 CTE matched_books
   -&gt;  Seq Scan on books  (cost=0.00..157.94 rows=1169 width=23)
         Filter: (title ~~ 'The %'::text)
 CTE related_authorships
   -&gt;  Hash Join  (cost=128.05..173.70 rows=1204 width=12)
         Hash Cond: (matched_books.id = authorships.book_id)
         -&gt;  CTE Scan on matched_books  (cost=0.00..23.38 rows=1169 width=4)
         -&gt;  Hash  (cost=74.69..74.69 rows=4269 width=12)
               -&gt;  Seq Scan on authorships  (cost=0.00..74.69 rows=4269 width=12)
 CTE related_authors
   -&gt;  Hash Join  (cost=31.59..138.06 rows=1204 width=18)
         Hash Cond: (authors.id = related_authorships.author_id)
         -&gt;  Seq Scan on authors  (cost=0.00..85.99 rows=2699 width=18)
         -&gt;  Hash  (cost=29.09..29.09 rows=200 width=4)
               -&gt;  HashAggregate  (cost=27.09..29.09 rows=200 width=4)
                     Group Key: related_authorships.author_id
                     -&gt;  CTE Scan on related_authorships  (cost=0.00..24.08 rows=1204 width=4)
 InitPlan 4 (returns $3)
   -&gt;  Aggregate  (cost=821.01..821.02 rows=1 width=32)
         -&gt;  Hash Left Join  (cost=791.57..818.09 rows=1169 width=60)
               Hash Cond: (matched_books_1.id = sub.book_id)
               -&gt;  CTE Scan on matched_books matched_books_1  (cost=0.00..23.38 rows=1169 width=32)
               -&gt;  Hash  (cost=789.07..789.07 rows=200 width=36)
                     -&gt;  Subquery Scan on sub  (cost=750.83..789.07 rows=200 width=36)
                           -&gt;  Unique  (cost=750.83..787.07 rows=200 width=40)
                                 -&gt;  Sort  (cost=750.83..768.95 rows=7248 width=40)
                                       Sort Key: related_authorships_1.book_id, related_authorships_1.id
                                       -&gt;  Merge Left Join  (cost=171.37..286.11 rows=7248 width=40)
                                             Merge Cond: (related_authorships_1.author_id = related_authors.id)
                                             -&gt;  Sort  (cost=85.69..88.70 rows=1204 width=12)
                                                   Sort Key: related_authorships_1.author_id
                                                   -&gt;  CTE Scan on related_authorships related_authorships_1  (cost=0.00..24.08 rows=1204 width=12)
                                             -&gt;  Sort  (cost=85.69..88.70 rows=1204 width=36)
                                                   Sort Key: related_authors.id
                                                   -&gt;  CTE Scan on related_authors  (cost=0.00..24.08 rows=1204 width=36)
 InitPlan 5 (returns $4)
   -&gt;  Aggregate  (cost=27.09..27.10 rows=1 width=32)
         -&gt;  CTE Scan on related_authorships related_authorships_2  (cost=0.00..24.08 rows=1204 width=32)
 InitPlan 6 (returns $5)
   -&gt;  Aggregate  (cost=27.09..27.10 rows=1 width=32)
         -&gt;  CTE Scan on related_authors related_authors_1  (cost=0.00..24.08 rows=1204 width=88)

LIKE問題：除了初始條件之外，我還缺少任何明顯的索引或其他優化嗎？

第二種方法是先加入所有內容，然後提取每個實體類型，這肯定有點尷尬：

WITH joined AS (
 -- Use row/composite values to keep things separate
 SELECT books, authorships, authors
 FROM (SELECT id, title FROM books) books
 LEFT JOIN (SELECT id, book_id, author_id FROM authorships) authorships ON books.id = authorships.book_id
 LEFT JOIN (SELECT id, name FROM authors) authors ON authors.id = authorships.author_id
 WHERE title LIKE 'The %'
),
related_authorships AS (
 SELECT DISTINCT ON ((authorships).id) (authorships).*
 FROM joined
 WHERE (authorships).id IS NOT NULL
),
related_authors AS (
 SELECT DISTINCT ON ((authors).id) (authors).*
 FROM joined
 WHERE (authors).id IS NOT NULL
)
SELECT
 (SELECT json_agg(books ORDER BY first_author_name)
   FROM (
     SELECT DISTINCT ON ((books).id) books, (authors).name AS first_author_name
     FROM joined
     ORDER BY (books).id, (authorships).id
   ) sub
 ) books,
 (SELECT json_agg(related_authorships.* ORDER BY id) FROM related_authorships) authorships,
 (SELECT json_agg(related_authors.* ORDER BY name) FROM related_authors) authors;

我不會粘貼查詢計劃；成本因素低於第一個查詢，但實際上平均需要稍長的時間（我知道我可以使用這個特定版本進行一些微優化，但為了更清晰，我把它留成這樣）。結合尷尬的部分，我更喜歡第一種方法。

引用自：https://dba.stackexchange.com/questions/275125

相關實體的獨立聚合，同時也按相關排序（在單個語句中）

相關問答

使用一列中的值作為列名並彙總另一表中的數據

CTE 子句不能在最終的 ORDER BY 語句中使用

子查詢還是內部聯接？

對具有多個連接的不同行求和

Postgres 嵌套 WHEN 聚合函式

我正在嘗試查找 COUNT 請求的平均值