使用過濾器優化複雜的 Postgres 查詢

August 18, 2017

所以我之前問過這個問題，我得到了一個非常有見地的答案。但是，我希望能夠進一步細分此查詢，Postgresql 9.6.3並且它再次開始變慢。我不確定部分索引是否會有所幫助，因為它不是來自布爾值。

所以這是執行非常好的基本查詢：

EXPLAIN ANALYZE
SELECT posts.*
FROM unnest('{17858,50909,52659,50914,50916,51696,52661,52035,17860,53315,54027,53305}'::int []) s(source_id),
 LATERAL
 (SELECT "posts".*
  FROM "posts"
  WHERE (source_id = s.source_id)
    AND ("posts"."deleted_at" IS NOT NULL)
    AND "posts"."rejected_at" IS NULL
  ORDER BY posts.external_created_at DESC
  LIMIT 100) posts
ORDER BY posts.external_created_at DESC
LIMIT 100
OFFSET 1;
                                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=30895.79..30896.04 rows=100 width=1043) (actual time=5.299..5.337 rows=100 loops=1)
  -&gt;  Sort  (cost=30895.78..30920.78 rows=10000 width=1043) (actual time=5.297..5.325 rows=101 loops=1)
        Sort Key: posts.external_created_at DESC
        Sort Method: top-N heapsort  Memory: 110kB
        -&gt;  Nested Loop  (cost=0.56..30512.87 rows=10000 width=1043) (actual time=0.085..4.077 rows=738 loops=1)
              -&gt;  Function Scan on unnest s  (cost=0.00..1.00 rows=100 width=4) (actual time=0.011..0.016 rows=12 loops=1)
              -&gt;  Limit  (cost=0.56..303.12 rows=100 width=1043) (actual time=0.018..0.298 rows=62 loops=12)
                    -&gt;  Index Scan using index_posts_for_moderation_queue on posts  (cost=0.56..7628.00 rows=2521 width=1043) (actual time=0.017..0.285 rows=62 loops=12)
                          Index Cond: (source_id = s.source_id)
Planning time: 0.443 ms
Execution time: 5.433 ms
(11 rows)

這是修改後的，帶有過濾器，速度要慢得多：

EXPLAIN ANALYZE
SELECT posts.*
FROM unnest('{17858,50909,52659,50914,50916,51696,52661,52035,17860,53315,54027,53305}'::int []) s(source_id),
 LATERAL
 (SELECT "posts".*
  FROM "posts"
  WHERE (source_id = s.source_id)
    AND ("posts"."deleted_at" IS NOT NULL)
    AND "posts"."deleted_by" = 'User'
    AND "posts"."rejected_at" IS NULL
  ORDER BY posts.external_created_at DESC
  LIMIT 100) posts
ORDER BY posts.external_created_at DESC
LIMIT 100
OFFSET 0;
                                                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=551390.03..551390.28 rows=100 width=1043) (actual time=769.522..769.522 rows=0 loops=1)
  -&gt;  Sort  (cost=551390.03..551391.78 rows=700 width=1043) (actual time=769.521..769.521 rows=0 loops=1)
        Sort Key: posts.external_created_at DESC
        Sort Method: quicksort  Memory: 25kB
        -&gt;  Nested Loop  (cost=5513.47..551363.28 rows=700 width=1043) (actual time=769.508..769.508 rows=0 loops=1)
              -&gt;  Function Scan on unnest s  (cost=0.00..1.00 rows=100 width=4) (actual time=0.012..0.022 rows=12 loops=1)
              -&gt;  Limit  (cost=5513.47..5513.48 rows=7 width=1043) (actual time=64.122..64.122 rows=0 loops=12)
                    -&gt;  Sort  (cost=5513.47..5513.48 rows=7 width=1043) (actual time=64.120..64.120 rows=0 loops=12)
                          Sort Key: posts.external_created_at DESC
                          Sort Method: quicksort  Memory: 25kB
                          -&gt;  Bitmap Heap Scan on posts  (cost=5485.28..5513.37 rows=7 width=1043) (actual time=64.104..64.104 rows=0 loops=12)
                                Recheck Cond: ((source_id = s.source_id) AND (deleted_at IS NOT NULL) AND (rejected_at IS NULL) AND ((deleted_by)::text = 'User'::text))
                                Rows Removed by Index Recheck: 1
                                Heap Blocks: exact=9
                                -&gt;  BitmapAnd  (cost=5485.28..5485.28 rows=7 width=0) (actual time=64.098..64.098 rows=0 loops=12)
                                      -&gt;  Bitmap Index Scan on index_posts_for_moderation_queue  (cost=0.00..59.47 rows=2521 width=0) (actual time=0.028..0.028 rows=168 loops=12)
                                            Index Cond: (source_id = s.source_id)
                                      -&gt;  Bitmap Index Scan on index_posts_on_deleted_by  (cost=0.00..5425.55 rows=291865 width=0) (actual time=76.855..76.855 rows=334200 loops=10)
                                            Index Cond: ((deleted_by)::text = 'User'::text)
Planning time: 0.348 ms
Execution time: 769.660 ms
(21 rows)

兩者之間唯一的區別是第二個作為AND "posts"."deleted_by" = 'User'部分額外添加到橫向查詢中。

問題是“使用者”的值在哪裡，它不是布爾值，可以是任何值。

有沒有辦法進一步優化這個查詢，以便它更快，即使使用 deleted_by 查詢集？

這是數據庫結構和索引和設置：

CREATE TABLE posts (
   id integer NOT NULL,
   source_id integer,
   message text,
   image text,
   external_id text,
   created_at timestamp without time zone,
   updated_at timestamp without time zone,
   external text,
   like_count integer DEFAULT 0 NOT NULL,
   comment_count integer DEFAULT 0 NOT NULL,
   external_created_at timestamp without time zone,
   deleted_at timestamp without time zone,
   poster_name character varying(255),
   poster_image text,
   poster_url character varying(255),
   poster_id text,
   position integer,
   location character varying(255),
   description text,
   video text,
   rejected_at timestamp without time zone,
   deleted_by character varying(255),
   height integer,
   width integer
);

CREATE INDEX index_posts_on_source_id_and_external_created_at ON posts USING btree (source_id, external_created_at DESC) WHERE deleted_at IS NOT NULL AND rejected_at IS NULL;
CREATE INDEX index_posts_on_deleted_at ON posts USING btree (deleted_at);
CREATE INDEX index_posts_on_deleted_by ON posts USING btree (deleted_by);
CREATE INDEX index_posts_on_source_id ON posts USING btree (source_id);

上面的第一個索引是我對上一個問題的回答的結果。

Postgres 記憶體設置：

name, setting, unit
'default_statistics_target','100',''
'effective_cache_size','16384','8kB'
'maintenance_work_mem','16384','kB'
'max_connections','100',''
'random_page_cost','4',NULL
'seq_page_cost','1',NULL
'shared_buffers','16384','8kB'
'work_mem','1024','kB'

數據庫統計：

Total Posts: 20,997,027
Posts where deleted_at is null: 15,665,487
Distinct source_id's: 22,245
Max number of rows per single source_id: 1,543,950
Min number of rows per single source_id: 1
Most source_ids in a single query: 21
Distinct external_created_at: 11,146,151

編輯

我嘗試了從 Evan 那裡得到的具有不同源 ID 的簡化答案，而且速度很慢：

EXPLAIN ANALYZE
SELECT *
FROM posts AS p
WHERE source_id IN (159469,120669,120668,120670,120671,120674,120662,120661,120664,109450,109448,109447,108039,159468,157810)
 AND deleted_at IS NOT NULL
 AND deleted_by = 'Filter'
 AND rejected_at IS NULL
ORDER BY external_created_at DESC
LIMIT 100;
                                                                                                                               QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=74114.14..74114.19 rows=100 width=1060) (actual time=2794.981..2794.981 rows=0 loops=1)
  -&gt;  Sort  (cost=74114.14..74115.48 rows=2678 width=1060) (actual time=2794.981..2794.981 rows=0 loops=1)
        Sort Key: external_created_at DESC
        Sort Method: quicksort  Memory: 25kB
        -&gt;  Bitmap Heap Scan on posts p  (cost=68759.42..74093.67 rows=2678 width=1060) (actual time=2794.977..2794.977 rows=0 loops=1)
              Recheck Cond: ((source_id = ANY ('{159469,120669,120668,120670,120671,120674,120662,120661,120664,109450,109448,109447,108039,159468,157810}'::integer[])) AND (deleted_at IS NOT NULL) AND (rejected_at IS NULL) AND ((deleted_by)::text = 'Filter'::text))
              Rows Removed by Index Recheck: 32326
              Heap Blocks: exact=16019
              -&gt;  BitmapAnd  (cost=68759.42..68759.42 rows=2678 width=0) (actual time=2745.376..2745.376 rows=0 loops=1)
                    -&gt;  Bitmap Index Scan on index_posts_for_moderation_queue  (cost=0.00..830.64 rows=52637 width=0) (actual time=42.319..42.319 rows=272192 loops=1)
                          Index Cond: (source_id = ANY ('{159469,120669,120668,120670,120671,120674,120662,120661,120664,109450,109448,109447,108039,159468,157810}'::integer[]))
                    -&gt;  Bitmap Index Scan on index_posts_on_deleted_by  (cost=0.00..67928.46 rows=6942897 width=0) (actual time=2651.123..2651.123 rows=7863994 loops=1)
                          Index Cond: ((deleted_by)::text = 'Filter'::text)
Planning time: 0.856 ms
Execution time: 2795.033 ms
(15 rows)

我使用的原因LATERAL可以通過我之前優化此查詢的另一個問題來解釋。

直接從查詢中修復一些問題。試試這個。
停止使用雙引號。這些都不應該被雙引號引起來。
永遠不要說“，橫向”。那是 SQL-89 JOIN 語法。是時候更新它了。這些都是CROSS JOIN LATERAL
不要將字元串文字用於整數。只做陣列$$ $$.
當您CROSS JOIN LATERAL可以將其重寫為INNER JOIN.
INNER JOIN當您可以將其重寫為 a 時，請勿使用on 文字WHERE x IN ()。
WHERE x IN當列表來自 SQL 時不要使用。使用EXISTS（這在這裡不適用，但如果我在咆哮……）。
嘗試這個。
EXPLAIN ANALYZE
SELECT posts.*
FROM posts AS p
WHERE source_id IN (17858,50909,52659,50914,50916,51696,52661,52035,17860,53315,54027,53305)
 AND deleted_at IS NOT NULL
 AND deleted_by = 'User'
 AND posts.rejected_at IS NULL
ORDER BY posts.external_created_at DESC
LIMIT 100;
更新
您對該查詢的大問題只是deleted_by. 這是我的建議。
這些是您目前的索引，
CREATE INDEX index_posts_on_source_id_and_external_created_at ON posts USING btree (source_id, external_created_at DESC) WHERE deleted_at IS NOT NULL AND rejected_at IS NULL;
CREATE INDEX index_posts_on_deleted_at ON posts USING btree (deleted_at);
CREATE INDEX index_posts_on_deleted_by ON posts USING btree (deleted_by);
CREATE INDEX index_posts_on_source_id ON posts USING btree (source_id);
沒有理由擁有index_posts_on_source_id_and_external_created_at和index_posts_on_source_id。它們都覆蓋了第一個 source_id。所以刪除index_posts_on_source_id它只是減慢插入速度。
二是你的大問題deleted_by。有兩種方法可以解決這個問題。
一個是複合索引，因此我們不必進行兩次索引掃描並將它們點陣圖合併在一起。
是謂詞索引。
如果deleted_by只能是幾種類型的值，您可以考慮將其創建為一種enum類型並刪除字元串比較。

引用自：https://dba.stackexchange.com/questions/183795

使用過濾器優化複雜的 Postgres 查詢

更新

相關問答

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

為什麼 PostgreSQL 9.5 不使用我最新的 ORDER BY 索引，即使它使用類似的索引就好了？

為什麼在子查詢中 ORDER BY 時沒有使用我的 PostgreSQL 表達式索引？

帶限制的索引查詢，對一列排序，對另一列進行謂詞

優化具有小 LIMIT 的查詢，以一列為謂詞並按另一列排序