Postgresql

應該使用哪些索引來優化 JOIN 深度為 2 的 PostgreSQL 查詢?

  • October 25, 2016

免責聲明:我對 PostgreSQL 比較陌生。

我想知道如何優化執行 2 INNER JOINs 的查詢。我的場景相當簡單:

選擇帶有照片 ( Posts.photo IS NOT NULL) 和名稱為“已死”( ) 的 Hashtag 的文章Hashtags.name = 'dead'

關聯如下:

Posts <- PostHashtags -> Hashtags

Posts.id    = PostHashtags.postId (FK)
Hashtags.id = PostHashtags.hashtagId (FK)

這是查詢:

SELECT
 "Posts".*,
 "hashtags"."id" AS "hashtags.id",
 "hashtags"."count" AS "hashtags.count",
 "hashtags"."name" AS "hashtags.name",
 "hashtags"."createdAt" AS "hashtags.createdAt",
 "hashtags"."updatedAt" AS "hashtags.updatedAt",
 "hashtags"."objectId" AS "hashtags.objectId",
 "hashtags"."_etl" AS "hashtags._etl",
 "hashtags.PostHashtag"."id" AS "hashtags.PostHashtag.id",
 "hashtags.PostHashtag"."createdAt" AS "hashtags.PostHashtag.createdAt",
 "hashtags.PostHashtag"."updatedAt" AS "hashtags.PostHashtag.updatedAt",
 "hashtags.PostHashtag"."postId" AS "hashtags.PostHashtag.postId",
 "hashtags.PostHashtag"."hashtagId" AS "hashtags.PostHashtag.hashtagId",
 "hashtags.PostHashtag"."objectId" AS "hashtags.PostHashtag.objectId",
 "hashtags.PostHashtag"."_etl" AS "hashtags.PostHashtag._etl"

FROM (
 SELECT
   "Posts"."id",
   "Posts"."note",
   "Posts"."photo",
   "Posts"."createdAt",
   "user"."id" AS "user.id",
   "user"."name" AS "user.name"
 FROM "Posts" AS "Posts"

 INNER JOIN "Users" AS "user" ON "Posts"."userId" = "user"."id"

 WHERE "Posts"."photo" IS NOT NULL
 AND (
   SELECT "PostHashtags"."id" FROM "PostHashtags" AS "PostHashtags"
   INNER JOIN "Hashtags" AS "Hashtag" ON "PostHashtags"."hashtagId" = "Hashtag"."id"
   WHERE "Posts"."id" = "PostHashtags"."postId"
   LIMIT 1
 ) IS NOT NULL

 ORDER BY "Posts"."createdAt" DESC LIMIT 10
) AS "Posts"

INNER JOIN (
 "PostHashtags" AS "hashtags.PostHashtag"
 INNER JOIN "Hashtags" AS "hashtags" ON "hashtags"."id" = "hashtags.PostHashtag"."hashtagId"
)

ON "Posts"."id" = "hashtags.PostHashtag"."postId"
AND "hashtags"."name" = 'dead'

ORDER BY "Posts"."createdAt" DESC;

解釋結果:

Nested Loop  (cost=886222912.89..886223769.55 rows=1 width=277)
 Join Filter: ("hashtags.PostHashtag"."postId" = "Posts".id)
 ->  Limit  (cost=886220835.39..886220835.42 rows=10 width=189)
       ->  Sort  (cost=886220835.39..886220988.88 rows=61394 width=189)
             Sort Key: "Posts"."createdAt"
             ->  Nested Loop  (cost=0.42..886219508.69 rows=61394 width=189)
                   ->  Seq Scan on "Posts"  (cost=0.00..885867917.51 rows=78196 width=177)
                         Filter: ((photo IS NOT NULL) AND ((SubPlan 1) IS NOT NULL))
                         SubPlan 1
                           ->  Limit  (cost=0.42..815.70 rows=1 width=4)
                                 ->  Nested Loop  (cost=0.42..815.70 rows=1 width=4)
                                       ->  Seq Scan on "PostHashtags"  (cost=0.00..811.25 rows=1 width=8)
                                             Filter: ("Posts".id = "postId")
                                       ->  Index Only Scan using "Hashtags_pkey" on "Hashtags" "Hashtag"  (cost=0.42..4.44 rows=1 width=4)
                                             Index Cond: (id = "PostHashtags"."hashtagId")
                   ->  Index Scan using "Users_pkey" on "Users" "user"  (cost=0.42..4.49 rows=1 width=16)
                         Index Cond: (id = "Posts"."userId")
 ->  Materialize  (cost=2077.50..2933.89 rows=1 width=88)
       ->  Hash Join  (cost=2077.50..2933.89 rows=1 width=88)
             Hash Cond: ("hashtags.PostHashtag"."hashtagId" = hashtags.id)
             ->  Seq Scan on "PostHashtags" "hashtags.PostHashtag"  (cost=0.00..721.00 rows=36100 width=40)
             ->  Hash  (cost=2077.49..2077.49 rows=1 width=48)
                   ->  Seq Scan on "Hashtags" hashtags  (cost=0.00..2077.49 rows=1 width=48)
                         Filter: ((name)::text = 'dead'::text)

此查詢已略微簡化。它還OUTER JOINS對與 相關的其他數據執行Posts,這就是為什麼SELECT必須對 執行Posts而不是,說,PostHashtags

EXPLAIN將不勝感激任何有助於將其轉換為有用索引的幫助。

我的想法:

  1. 建立一個索引Posts.photo,但它應該是一個部分索引WHERE "photo" IS NOT NULL嗎?
  2. 在 上建立UNIQUE索引Hashtags.name

不過,我不確定這些是否一定是瓶頸。

還要考慮第一個答案

詢問

這與您目前的查詢目前所做的一樣,更簡單、更快:

SELECT p.id, p.note, p.photo, p."createdAt",
 u.id           AS "user.id",
 u.name         AS "user.name",
 h.id           AS "hashtags.id",
 h.count        AS "hashtags.count",
 h.name         AS "hashtags.name",
 h."createdAt"  AS "hashtags.createdAt",
 h."updatedAt"  AS "hashtags.updatedAt",
 h."objectId"   AS "hashtags.objectId",
 h._etl         AS "hashtags._etl",
 ph.id          AS "hashtags.PostHashtag.id",
 ph."createdAt" AS "hashtags.PostHashtag.createdAt",
 ph."updatedAt" AS "hashtags.PostHashtag.updatedAt",
 ph."postId"    AS "hashtags.PostHashtag.postId",
 ph."hashtagId" AS "hashtags.PostHashtag.hashtagId",
 ph."objectId"  AS "hashtags.PostHashtag.objectId",
 ph._etl        AS "hashtags.PostHashtag._etl"
FROM (
   SELECT id, note, photo, "createdAt", "userId"
   FROM   "Posts" p
   WHERE  photo IS NOT NULL
   AND    EXISTS (
       SELECT 1
       FROM   "PostHashtags" ph
       WHERE  ph."postId" = p.id
       )
   ORDER  BY p."createdAt" DESC
   LIMIT  10
  ) p
JOIN   "PostHashtags" ph ON ph."postId" = p.id
JOIN   "Hashtags"     h  ON h.id = ph."hashtagId"
JOIN   "Users"        u  ON u.id = p."userId"
WHERE  h.name = 'dead'
ORDER  BY p."createdAt" DESC;

**EXISTS**半連接應該比您的子查詢構造更快。我假設該列"PostHashtags".id是 PK 並且本身不能為 NULL。此外,如果參照完整性由 FK 約束強制執行,則無需"Hashtags"在此測試中加入。

索引

部分索引Posts

CREATE INDEX posts_foo_idx ON "Posts" ("createdAt", id)
WHERE photo IS NOT NULL;

注意列:("createdAt", id)。Postgres 將發布最新的文章,我希望從頂部開始進行索引掃描posts_foo_idx然後測試匹配條目是否與下一個索引一起PostHashtags使用。id

唯一索引PostHashtags

這次我們"postId"首先需要索引。

其餘的大多像第一個答案

引用自:https://dba.stackexchange.com/questions/89338