Postgresql

ORDER BY 子句會影響查詢性能

  • March 13, 2018

語境:

PostgreSQL 10,users 表有 3667438 條記錄,users 表有一個叫做 social 的 JSONB,我們通常使用索引計算函式輸出的策略,因此我們可以將資訊聚合到單個索引中。函式的輸出engagement(social)是雙精度數值類型。

問題:

有問題的子句是,該數據還附加了ORDER BY engagement(social) DESC NULLS LAST一個 btree 索引。idx_in_social_engagement with DESC NULLS LAST

快速查詢:

EXPLAIN ANALYZE
SELECT  "users".* FROM "users"
WHERE (follower_count(social) < 500000)
AND (engagement(social) > 0.03)
AND (engagement(social) < 0.25)
AND (peemv(social) < 533)
ORDER BY "users"."created_at" ASC
LIMIT 12 OFFSET 0;

Limit  (cost=0.43..52.25 rows=12 width=1333) (actual time=0.113..1.625 
rows=12 loops=1)
  ->  Index Scan using created_at_idx on users  (cost=0.43..7027711.55 rows=1627352 width=1333) (actual time=0.112..1.623 rows=12 loops=1)
        Filter: ((follower_count(social) < 500000) AND (engagement(social) > '0.03'::double precision) AND (engagement(social) <  '0.25'::double precision) AND (peemv(social) > '0'::double precision) AND (peemv(social) < '533'::double precision))
        Rows Removed by Filter: 8
Planning time: 0.324 ms
Execution time: 1.639 ms

慢查詢:

EXPLAIN ANALYZE 
SELECT  "users".* FROM "users" 
WHERE (follower_count(social) < 500000) 
AND (engagement(social) > 0.03) 
AND (engagement(social) < 0.25) 
AND (peemv(social) > 0.0) 
AND (peemv(social) < 533) 
ORDER BY engagement(social) DESC NULLS LAST, "users"."created_at" ASC 
LIMIT 12 OFFSET 0;

Limit  (cost=2884438.00..2884438.03 rows=12 width=1341) (actual time=68011.728..68011.730 rows=12 loops=1)
->  Sort  (cost=2884438.00..2888506.38 rows=1627352 width=1341) (actual time=68011.727..68011.728 rows=12 loops=1)
       Sort Key: (engagement(social)) DESC NULLS LAST, created_at
       Sort Method: top-N heapsort  Memory: 45kB
       ->  Index Scan using idx_in_social_engagement on users  (cost=0.43..2847131.26 rows=1627352 width=1341) (actual time=0.082..67019.102 rows=1360633 loops=1)
           Index Cond: ((engagement(social) > '0.03'::double precision) AND (engagement(social) < '0.25'::double precision))
           Filter: ((follower_count(social) < 500000) AND (peemv(social) > '0'::double precision) AND (peemv(social) < '533'::double precision))
           Rows Removed by Filter: 85580
Planning time: 0.312 ms
Execution time: 68011.752 ms

選擇帶有 * 因為我需要儲存在每一行中的所有數據。

更新:

CREATE INDEX idx_in_social_engagement on influencers USING BTREE ( engagement(social) DESC NULLS LAST)

準確的索引定義

你的ORDER BY條款是:

engagement(social) DESC NULLS LAST, "users"."created_at" ASC

但我懷疑你的索引只是在:

engagement(social) DESC NULLS LAST

所以該指數並不能完全支撐ORDER BY

您可以在不使用JSONB或表達式索引的情況下重現相同的問題。您可以通過ORDER BY.

如果 PostgreSQL 規劃器非常聰明,它可能能夠有效地使用現有索引。它必須繼續前進,engagement(social) DESC NULLS LAST直到它收集到 12 個滿足所有其餘過濾器要求的元組。然後它將繼續前進該索引,直到它收集engagement(social)與第 12 個元組相關的所有其餘元組(並且滿足其他標準)。然後它必須重新排序所有收集到的元組ORDER BY,並將 應用於該LIMIT 12擴展和重新排序的集合。但是 PostgreSQL 規劃器並不是無限明智的。

引用自:https://dba.stackexchange.com/questions/199968