Postgresql
應該使用哪些索引來優化 JOIN 深度為 2 的 PostgreSQL 查詢?
免責聲明:我對 PostgreSQL 比較陌生。
我想知道如何優化執行 2
INNER JOIN
s 的查詢。我的場景相當簡單:選擇帶有照片 (
Posts.photo IS NOT NULL
) 和名稱為“已死”( ) 的 Hashtag 的文章Hashtags.name = 'dead'
。關聯如下:
Posts <- PostHashtags -> Hashtags Posts.id = PostHashtags.postId (FK) Hashtags.id = PostHashtags.hashtagId (FK)
這是查詢:
SELECT "Posts".*, "hashtags"."id" AS "hashtags.id", "hashtags"."count" AS "hashtags.count", "hashtags"."name" AS "hashtags.name", "hashtags"."createdAt" AS "hashtags.createdAt", "hashtags"."updatedAt" AS "hashtags.updatedAt", "hashtags"."objectId" AS "hashtags.objectId", "hashtags"."_etl" AS "hashtags._etl", "hashtags.PostHashtag"."id" AS "hashtags.PostHashtag.id", "hashtags.PostHashtag"."createdAt" AS "hashtags.PostHashtag.createdAt", "hashtags.PostHashtag"."updatedAt" AS "hashtags.PostHashtag.updatedAt", "hashtags.PostHashtag"."postId" AS "hashtags.PostHashtag.postId", "hashtags.PostHashtag"."hashtagId" AS "hashtags.PostHashtag.hashtagId", "hashtags.PostHashtag"."objectId" AS "hashtags.PostHashtag.objectId", "hashtags.PostHashtag"."_etl" AS "hashtags.PostHashtag._etl" FROM ( SELECT "Posts"."id", "Posts"."note", "Posts"."photo", "Posts"."createdAt", "user"."id" AS "user.id", "user"."name" AS "user.name" FROM "Posts" AS "Posts" INNER JOIN "Users" AS "user" ON "Posts"."userId" = "user"."id" WHERE "Posts"."photo" IS NOT NULL AND ( SELECT "PostHashtags"."id" FROM "PostHashtags" AS "PostHashtags" INNER JOIN "Hashtags" AS "Hashtag" ON "PostHashtags"."hashtagId" = "Hashtag"."id" WHERE "Posts"."id" = "PostHashtags"."postId" LIMIT 1 ) IS NOT NULL ORDER BY "Posts"."createdAt" DESC LIMIT 10 ) AS "Posts" INNER JOIN ( "PostHashtags" AS "hashtags.PostHashtag" INNER JOIN "Hashtags" AS "hashtags" ON "hashtags"."id" = "hashtags.PostHashtag"."hashtagId" ) ON "Posts"."id" = "hashtags.PostHashtag"."postId" AND "hashtags"."name" = 'dead' ORDER BY "Posts"."createdAt" DESC;
解釋結果:
Nested Loop (cost=886222912.89..886223769.55 rows=1 width=277) Join Filter: ("hashtags.PostHashtag"."postId" = "Posts".id) -> Limit (cost=886220835.39..886220835.42 rows=10 width=189) -> Sort (cost=886220835.39..886220988.88 rows=61394 width=189) Sort Key: "Posts"."createdAt" -> Nested Loop (cost=0.42..886219508.69 rows=61394 width=189) -> Seq Scan on "Posts" (cost=0.00..885867917.51 rows=78196 width=177) Filter: ((photo IS NOT NULL) AND ((SubPlan 1) IS NOT NULL)) SubPlan 1 -> Limit (cost=0.42..815.70 rows=1 width=4) -> Nested Loop (cost=0.42..815.70 rows=1 width=4) -> Seq Scan on "PostHashtags" (cost=0.00..811.25 rows=1 width=8) Filter: ("Posts".id = "postId") -> Index Only Scan using "Hashtags_pkey" on "Hashtags" "Hashtag" (cost=0.42..4.44 rows=1 width=4) Index Cond: (id = "PostHashtags"."hashtagId") -> Index Scan using "Users_pkey" on "Users" "user" (cost=0.42..4.49 rows=1 width=16) Index Cond: (id = "Posts"."userId") -> Materialize (cost=2077.50..2933.89 rows=1 width=88) -> Hash Join (cost=2077.50..2933.89 rows=1 width=88) Hash Cond: ("hashtags.PostHashtag"."hashtagId" = hashtags.id) -> Seq Scan on "PostHashtags" "hashtags.PostHashtag" (cost=0.00..721.00 rows=36100 width=40) -> Hash (cost=2077.49..2077.49 rows=1 width=48) -> Seq Scan on "Hashtags" hashtags (cost=0.00..2077.49 rows=1 width=48) Filter: ((name)::text = 'dead'::text)
此查詢已略微簡化。它還
OUTER JOINS
對與 相關的其他數據執行Posts
,這就是為什麼SELECT
必須對 執行Posts
而不是,說,PostHashtags
。
EXPLAIN
將不勝感激任何有助於將其轉換為有用索引的幫助。我的想法:
- 建立一個索引
Posts.photo
,但它應該是一個部分索引WHERE "photo" IS NOT NULL
嗎?- 在 上建立
UNIQUE
索引Hashtags.name
。不過,我不確定這些是否一定是瓶頸。
還要考慮第一個答案。
詢問
這與您目前的查詢目前所做的一樣,更簡單、更快:
SELECT p.id, p.note, p.photo, p."createdAt", u.id AS "user.id", u.name AS "user.name", h.id AS "hashtags.id", h.count AS "hashtags.count", h.name AS "hashtags.name", h."createdAt" AS "hashtags.createdAt", h."updatedAt" AS "hashtags.updatedAt", h."objectId" AS "hashtags.objectId", h._etl AS "hashtags._etl", ph.id AS "hashtags.PostHashtag.id", ph."createdAt" AS "hashtags.PostHashtag.createdAt", ph."updatedAt" AS "hashtags.PostHashtag.updatedAt", ph."postId" AS "hashtags.PostHashtag.postId", ph."hashtagId" AS "hashtags.PostHashtag.hashtagId", ph."objectId" AS "hashtags.PostHashtag.objectId", ph._etl AS "hashtags.PostHashtag._etl" FROM ( SELECT id, note, photo, "createdAt", "userId" FROM "Posts" p WHERE photo IS NOT NULL AND EXISTS ( SELECT 1 FROM "PostHashtags" ph WHERE ph."postId" = p.id ) ORDER BY p."createdAt" DESC LIMIT 10 ) p JOIN "PostHashtags" ph ON ph."postId" = p.id JOIN "Hashtags" h ON h.id = ph."hashtagId" JOIN "Users" u ON u.id = p."userId" WHERE h.name = 'dead' ORDER BY p."createdAt" DESC;
**
EXISTS
**半連接應該比您的子查詢構造更快。我假設該列"PostHashtags".id
是 PK 並且本身不能為 NULL。此外,如果參照完整性由 FK 約束強制執行,則無需"Hashtags"
在此測試中加入。索引
部分索引
Posts
CREATE INDEX posts_foo_idx ON "Posts" ("createdAt", id) WHERE photo IS NOT NULL;
注意列:
("createdAt", id)
。Postgres 將發布最新的文章,我希望從頂部開始進行索引掃描,posts_foo_idx
然後測試匹配條目是否與下一個索引一起PostHashtags
使用。id
唯一索引
PostHashtags
這次我們
"postId"
首先需要索引。其餘的大多像第一個答案。