相關實體的獨立聚合,同時也按相關排序(在單個語句中)
我有一個模型,
books
它們authors
之間有一個多對多的關係(因為一本書可以有多個作者,一個作者可以寫多本書)通過我命名的表authorships
。我的目標是查詢書籍的某個子集,並獲取相關作者的集合,以及相關作者的集合(即沒有重複),每個都以特定的順序排列。本質上,我想保留記錄的規範化/分離結構,我不想以任何方式去規範化(只是排序)。
通常,我認為您會使用多個語句來執行此操作,使用函式或外部程式碼為
IN
表達式或類似內容提供 ID。但是,我已經能夠在 PostgreSQL 中使用以下模式在單個語句中執行此操作:WITH matched_books AS ( SELECT id, title FROM books -- Could be any criteria: WHERE title LIKE 'The %' ), related_authorships AS ( SELECT authorships.id, book_id, author_id FROM authorships JOIN matched_books ON book_id = matched_books.id ), related_authors AS ( SELECT id, name FROM authors -- Could also use DISTINCT and do a join here, but I understand EXISTS is typically better for performance: WHERE EXISTS (SELECT 1 FROM related_authorships WHERE author_id = authors.id) ) SELECT -- Scalar subqueries that each return a single JSON array of objects: -- JSON is completely fine for my purposes, but could also use array_agg. (SELECT json_agg(matched_books.* ORDER BY title) FROM matched_books) books, (SELECT json_agg(related_authorships.* ORDER BY id) FROM related_authorships) authorships, (SELECT json_agg(related_authors.* ORDER BY name) FROM related_authors) authors;
(旁注:在以前的嘗試中,我曾
LEFT JOIN
在頂層使用 s 和json_agg(DISTINCT ...)
,但這讓我無法ORDER BY
有意義地使用,而且性能似乎更混亂/更差。)雖然這種方法幾乎效果很好,但我現在想訂購
books
儲存在其相關authors
和/或authorships
. 作為一個明顯的例子,假設我希望它們按name
作者的作者排序,或者如果有多個作者,則使用作者身份中的一列(id
在這種情況下它可以只是最小的整數)來確定應該使用哪個作者.我想不出一種方法可以允許這樣做,同時仍然獨立地返回集合,至少在沒有一些重複操作的情況下不會。你會如何解決這個問題?
我不確定我是否正確理解了您,但可能您需要將作者的其他欄位分組,以便只有一個 ORDER 值。它可能是 athours 的計數,或者 name 的 min / max 左右:
WITH books AS ( SELECT 1 AS id, 'The 1' AS title UNION ALL SELECT 2, 'The 2' UNION ALL SELECT 3, '3' ), authorships AS ( SELECT 1 AS id, 1 AS book_id, 1 AS author_id UNION ALL SELECT 2, 1, 2 UNION ALL SELECT 3, 1, 3 UNION ALL SELECT 4, 2, 1 UNION ALL SELECT 5, 3, 1 ), authors AS ( SELECT 1 AS id, 'name1' AS name UNION ALL SELECT 2, 'name2' UNION ALL SELECT 3, 'name3' UNION ALL SELECT 4, 'name4' ), filtered AS ( SELECT book_id, title, ba.id, author_id, name FROM books AS b JOIN authorships AS ba ON ba.book_id = b.id JOIN authors AS a ON a.id = ba.author_id WHERE title LIKE 'The %' ) SELECT ( SELECT json_agg(b.* ORDER BY count) FROM ( SELECT book_id AS id, title, count(name) FROM filtered AS f GROUP BY 1,2 ) AS f JOIN LATERAL (SELECT id, title) AS b ON true) ) AS book json_agg ----------------------------- [{"id":2,"title":"The 2"}, + {"id":1,"title":"The 1"}] (1 row)
因此,經過多次反複試驗,我能夠提出兩種以可接受的速度工作的方法。
使用問題範例中使用的方法,我確定不幸的是沒有辦法重複連接:
WITH matched_books AS ( SELECT id, title FROM books WHERE title LIKE 'The %' ), related_authorships AS ( SELECT authorships.id, book_id, author_id FROM authorships JOIN matched_books ON book_id = matched_books.id ), related_authors AS ( SELECT id, name FROM authors WHERE EXISTS (SELECT 1 FROM related_authorships WHERE author_id = authors.id) ) SELECT (SELECT json_agg(matched_books.* ORDER BY first_author_name) FROM matched_books LEFT JOIN ( SELECT DISTINCT ON (book_id) book_id, name AS first_author_name FROM related_authorships LEFT JOIN related_authors ON author_id = related_authors.id ORDER BY book_id, related_authorships.id ) sub ON book_id = matched_books.id ) books, (SELECT json_agg(related_authorships.* ORDER BY id) FROM related_authorships) authorships, (SELECT json_agg(related_authors.* ORDER BY name) FROM related_authors) authors;
EXPLAIN
計劃如下所示:Result (cost=1344.93..1344.94 rows=1 width=96) CTE matched_books -> Seq Scan on books (cost=0.00..157.94 rows=1169 width=23) Filter: (title ~~ 'The %'::text) CTE related_authorships -> Hash Join (cost=128.05..173.70 rows=1204 width=12) Hash Cond: (matched_books.id = authorships.book_id) -> CTE Scan on matched_books (cost=0.00..23.38 rows=1169 width=4) -> Hash (cost=74.69..74.69 rows=4269 width=12) -> Seq Scan on authorships (cost=0.00..74.69 rows=4269 width=12) CTE related_authors -> Hash Join (cost=31.59..138.06 rows=1204 width=18) Hash Cond: (authors.id = related_authorships.author_id) -> Seq Scan on authors (cost=0.00..85.99 rows=2699 width=18) -> Hash (cost=29.09..29.09 rows=200 width=4) -> HashAggregate (cost=27.09..29.09 rows=200 width=4) Group Key: related_authorships.author_id -> CTE Scan on related_authorships (cost=0.00..24.08 rows=1204 width=4) InitPlan 4 (returns $3) -> Aggregate (cost=821.01..821.02 rows=1 width=32) -> Hash Left Join (cost=791.57..818.09 rows=1169 width=60) Hash Cond: (matched_books_1.id = sub.book_id) -> CTE Scan on matched_books matched_books_1 (cost=0.00..23.38 rows=1169 width=32) -> Hash (cost=789.07..789.07 rows=200 width=36) -> Subquery Scan on sub (cost=750.83..789.07 rows=200 width=36) -> Unique (cost=750.83..787.07 rows=200 width=40) -> Sort (cost=750.83..768.95 rows=7248 width=40) Sort Key: related_authorships_1.book_id, related_authorships_1.id -> Merge Left Join (cost=171.37..286.11 rows=7248 width=40) Merge Cond: (related_authorships_1.author_id = related_authors.id) -> Sort (cost=85.69..88.70 rows=1204 width=12) Sort Key: related_authorships_1.author_id -> CTE Scan on related_authorships related_authorships_1 (cost=0.00..24.08 rows=1204 width=12) -> Sort (cost=85.69..88.70 rows=1204 width=36) Sort Key: related_authors.id -> CTE Scan on related_authors (cost=0.00..24.08 rows=1204 width=36) InitPlan 5 (returns $4) -> Aggregate (cost=27.09..27.10 rows=1 width=32) -> CTE Scan on related_authorships related_authorships_2 (cost=0.00..24.08 rows=1204 width=32) InitPlan 6 (returns $5) -> Aggregate (cost=27.09..27.10 rows=1 width=32) -> CTE Scan on related_authors related_authors_1 (cost=0.00..24.08 rows=1204 width=88)
LIKE
問題:除了初始條件之外,我還缺少任何明顯的索引或其他優化嗎?第二種方法是先加入所有內容,然後提取每個實體類型,這肯定有點尷尬:
WITH joined AS ( -- Use row/composite values to keep things separate SELECT books, authorships, authors FROM (SELECT id, title FROM books) books LEFT JOIN (SELECT id, book_id, author_id FROM authorships) authorships ON books.id = authorships.book_id LEFT JOIN (SELECT id, name FROM authors) authors ON authors.id = authorships.author_id WHERE title LIKE 'The %' ), related_authorships AS ( SELECT DISTINCT ON ((authorships).id) (authorships).* FROM joined WHERE (authorships).id IS NOT NULL ), related_authors AS ( SELECT DISTINCT ON ((authors).id) (authors).* FROM joined WHERE (authors).id IS NOT NULL ) SELECT (SELECT json_agg(books ORDER BY first_author_name) FROM ( SELECT DISTINCT ON ((books).id) books, (authors).name AS first_author_name FROM joined ORDER BY (books).id, (authorships).id ) sub ) books, (SELECT json_agg(related_authorships.* ORDER BY id) FROM related_authorships) authorships, (SELECT json_agg(related_authors.* ORDER BY name) FROM related_authors) authors;
我不會粘貼查詢計劃;成本因素低於第一個查詢,但實際上平均需要稍長的時間(我知道我可以使用這個特定版本進行一些微優化,但為了更清晰,我把它留成這樣)。結合尷尬的部分,我更喜歡第一種方法。