Postgresql

如果我將它分成 3 個較小的查詢,為什麼我的大 Postgres 查詢會更快?

  • August 21, 2022

我有一個大/複雜的查詢(帶有一個 OR 子句和 2 個具有多個連接和一個 IN 條件的子查詢)。執行大查詢很慢。

但是,如果我將我的大查詢分解為 3 個由我的應用程式碼連結在一起的較小查詢,則整體執行時間會快得多。

這是為什麼?我認為一般來說使用一個大查詢會更好,因為它讓 Postgres 查詢規劃器可以完全控制優化,並且網路數據庫呼叫減少到只有一個呼叫。但看起來我在某些情況下可能是錯的。

這是完整的細節。

大查詢

詢問:

SELECT users.*
FROM users
WHERE (users.id IN
        (SELECT users.id
         FROM users
         INNER JOIN company_users ON users.id = company_users.user_id
         INNER JOIN bag_contributors ON bag_contributors.user_id = users.id
         INNER JOIN bags ON bags.deleted = FALSE
         AND bags.id = bag_contributors.bag_id
         INNER JOIN bag_items ON bag_items.deleted = FALSE
         AND bag_items.bag_id = bags.id
         WHERE company_users.deleted_at IS NULL
           AND company_users.company_id = 1595
           AND bag_items.item_id IN
             (SELECT items.id
              FROM items
              WHERE items.slug IN ('gray-hat', 'usb-cable')))
      OR users.id IN
        (SELECT users.id
         FROM users
         INNER JOIN company_users ON users.id = company_users.user_id
         INNER JOIN bags ON bags.deleted = FALSE
         AND bags.user_type = 'User'
         AND bags.user_id = users.id
         INNER JOIN bag_items ON bag_items.deleted = FALSE
         AND bag_items.bag_id = bags.id
         WHERE company_users.deleted_at IS NULL
           AND company_users.company_id = 1595
           AND bag_items.item_id IN
             (SELECT items.id
              FROM items
              WHERE items.slug IN ('gray-hat', 'usb-cable'))))
ORDER BY users.pending_user ASC

解釋分析結果:

Index Scan using index_users_on_pending_user on users  (cost=1293.90..286963.01 rows=941157 width=2029) (actual time=7.880..1062.384 rows=12 loops=1)
 Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
 Rows Removed by Filter: 1257498
 SubPlan 1
   ->  Nested Loop  (cost=2.54..573.48 rows=1 width=4) (actual time=0.301..5.760 rows=32 loops=1)
         Join Filter: (bag_items.item_id = items.id)
         Rows Removed by Join Filter: 4078
         ->  Nested Loop  (cost=2.13..566.53 rows=56 width=8) (actual time=0.200..3.965 rows=4110 loops=1)
               ->  Nested Loop  (cost=1.70..550.87 rows=6 width=16) (actual time=0.189..0.680 rows=63 loops=1)
                     ->  Nested Loop  (cost=1.27..542.72 rows=6 width=12) (actual time=0.147..0.400 rows=70 loops=1)
                           ->  Nested Loop  (cost=0.85..432.16 rows=154 width=12) (actual time=0.027..0.227 rows=28 loops=1)
                                 ->  Index Scan using index_company_users_on_company_id_and_deleted_at on company_users  (cost=0.43..173.33 rows=154 width=8) (actual time=0.016..0.048 rows=28 loops=1)
                                       Index Cond: ((company_id = 1595) AND (deleted_at IS NULL))
                                 ->  Index Only Scan using users_pkey on users users_1  (cost=0.43..1.68 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=28)
                                       Index Cond: (id = company_users.user_id)
                                       Heap Fetches: 19
                           ->  Index Only Scan using index_bag_contributors_on_user_id_and_bag_id on bag_contributors  (cost=0.41..0.61 rows=11 width=16) (actual time=0.004..0.005 rows=2 loops=28)
                                 Index Cond: (user_id = users_1.id)
                                 Heap Fetches: 59
                     ->  Index Scan using bags_pkey on bags  (cost=0.42..1.36 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=70)
                           Index Cond: (id = bag_contributors.bag_id)
                           Filter: (NOT deleted)
                           Rows Removed by Filter: 0
               ->  Index Scan using index_bag_items_on_bag_id on bag_items  (cost=0.43..1.93 rows=68 width=8) (actual time=0.004..0.040 rows=65 loops=63)
                     Index Cond: (bag_id = bags.id)
                     Filter: (NOT deleted)
                     Rows Removed by Filter: 2
         ->  Materialize  (cost=0.41..5.27 rows=2 width=4) (actual time=0.000..0.000 rows=1 loops=4110)
               ->  Index Scan using index_items_on_slug on items  (cost=0.41..5.26 rows=2 width=4) (actual time=0.048..0.077 rows=1 loops=1)
                     Index Cond: (slug = ANY ('{gray-hat,jeans}'::citext[]))
 SubPlan 2
   ->  Hash Join  (cost=7.00..719.99 rows=1 width=4) (actual time=1.119..1.703 rows=2 loops=1)
         Hash Cond: (bag_items_1.item_id = items_1.id)
         ->  Nested Loop  (cost=1.71..712.93 rows=676 width=8) (actual time=0.070..1.538 rows=766 loops=1)
               ->  Nested Loop  (cost=1.28..519.52 rows=72 width=8) (actual time=0.060..0.525 rows=52 loops=1)
                     ->  Nested Loop  (cost=0.85..432.16 rows=154 width=12) (actual time=0.046..0.236 rows=28 loops=1)
                           ->  Index Scan using index_company_users_on_company_id_and_deleted_at on company_users company_users_1  (cost=0.43..173.33 rows=154 width=8) (actual time=0.013..0.042 rows=28 loops=1)
                                 Index Cond: ((company_id = 1595) AND (deleted_at IS NULL))
                           ->  Index Only Scan using users_pkey on users users_2  (cost=0.43..1.68 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=28)
                                 Index Cond: (id = company_users_1.user_id)
                                 Heap Fetches: 19
                     ->  Index Scan using index_bags_on_user_id on bags bags_1  (cost=0.42..0.55 rows=2 width=8) (actual time=0.007..0.009 rows=2 loops=28)
                           Index Cond: (user_id = users_2.id)
                           Filter: ((NOT deleted) AND ((user_type)::text = 'User'::text))
                           Rows Removed by Filter: 0
               ->  Index Scan using index_bag_items_on_bag_id on bag_items bag_items_1  (cost=0.43..2.01 rows=68 width=8) (actual time=0.008..0.017 rows=15 loops=52)
                     Index Cond: (bag_id = bags_1.id)
                     Filter: (NOT deleted)
                     Rows Removed by Filter: 9
         ->  Hash  (cost=5.26..5.26 rows=2 width=4) (actual time=0.072..0.073 rows=1 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Index Scan using index_items_on_slug on items items_1  (cost=0.41..5.26 rows=2 width=4) (actual time=0.041..0.069 rows=1 loops=1)
                     Index Cond: (slug = ANY ('{gray-hat,jeans}'::citext[]))
Planning Time: 6.049 ms
Execution Time: 1062.577 ms

小查詢

總共有 3 個小查詢。應用程式碼使用前 2 個查詢將 ID 載入到記憶體中,並使用第 3 個查詢載入具有這些 ID 的使用者。

查詢:

SELECT users.id
FROM users
INNER JOIN company_users ON users.id = company_users.user_id
INNER JOIN bag_contributors ON bag_contributors.user_id = users.id
INNER JOIN bags ON bags.deleted = FALSE
AND bags.id = bag_contributors.bag_id
INNER JOIN bag_items ON bag_items.deleted = FALSE
AND bag_items.bag_id = bags.id
WHERE company_users.deleted_at IS NULL
 AND company_users.company_id = 1595
 AND bag_items.item_id IN
   (SELECT items.id
    FROM items
    WHERE items.slug IN ('gray-hat', 'jeans'))


---------------------------------------------------------------------

SELECT users.id
FROM users
INNER JOIN company_users ON users.id = company_users.user_id
INNER JOIN bags ON bags.deleted = FALSE
AND bags.user_type = 'User'
AND bags.user_id = users.id
INNER JOIN bag_items ON bag_items.deleted = FALSE
AND bag_items.bag_id = bags.id
WHERE company_users.deleted_at IS NULL
 AND company_users.company_id = 1595
 AND bag_items.item_id IN
   (SELECT items.id
    FROM items
    WHERE items.slug IN ('gray-hat', 'jeans'))


---------------------------------------------------------------------

SELECT users.*
FROM users
WHERE (users.id IN (2, 2, 2, 2, 234196, 234196, 234196, 234196, 234196, 234196, 234196, 234711, 605774, 648863, 912565, 912565, 1123517, 1123517, 1123517, 1097143, 1097143, 1097143, 257253, 257253, 257253, 257253, 257253, 257253, 257253, 1026609, 968371, 1107724)
      OR users.id IN (2, 257253))
ORDER BY users.pending_user ASC

解釋分析結果:

Nested Loop  (cost=2.54..573.48 rows=1 width=4) (actual time=18.414..35.196 rows=32 loops=1)
 Join Filter: (bag_items.item_id = items.id)
 Rows Removed by Join Filter: 4078
 ->  Nested Loop  (cost=2.13..566.53 rows=56 width=8) (actual time=17.887..33.104 rows=4110 loops=1)
       ->  Nested Loop  (cost=1.70..550.87 rows=6 width=16) (actual time=17.093..22.972 rows=63 loops=1)
             ->  Nested Loop  (cost=1.27..542.72 rows=6 width=12) (actual time=17.015..22.613 rows=70 loops=1)
                   ->  Nested Loop  (cost=0.85..432.16 rows=154 width=12) (actual time=9.570..19.868 rows=28 loops=1)
                         ->  Index Scan using index_company_users_on_company_id_and_deleted_at on company_users  (cost=0.43..173.33 rows=154 width=8) (actual time=9.379..18.490 rows=28 loops=1)
                               Index Cond: ((company_id = 1595) AND (deleted_at IS NULL))
                         ->  Index Only Scan using users_pkey on users  (cost=0.43..1.68 rows=1 width=4) (actual time=0.047..0.047 rows=1 loops=28)
                               Index Cond: (id = company_users.user_id)
                               Heap Fetches: 19
                   ->  Index Only Scan using index_bag_contributors_on_user_id_and_bag_id on bag_contributors  (cost=0.41..0.61 rows=11 width=16) (actual time=0.045..0.097 rows=2 loops=28)
                         Index Cond: (user_id = users.id)
                         Heap Fetches: 67
             ->  Index Scan using bags_pkey on bags  (cost=0.42..1.36 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=70)
                   Index Cond: (id = bag_contributors.bag_id)
                   Filter: (NOT deleted)
                   Rows Removed by Filter: 0
       ->  Index Scan using index_bag_items_on_bag_id on bag_items  (cost=0.43..1.93 rows=68 width=8) (actual time=0.083..0.149 rows=65 loops=63)
             Index Cond: (bag_id = bags.id)
             Filter: (NOT deleted)
             Rows Removed by Filter: 2
 ->  Materialize  (cost=0.41..5.27 rows=2 width=4) (actual time=0.000..0.000 rows=1 loops=4110)
       ->  Index Scan using index_items_on_slug on items  (cost=0.41..5.26 rows=2 width=4) (actual time=0.466..0.495 rows=1 loops=1)
             Index Cond: (slug = ANY ('{gray-hat,jeans}'::citext[]))
Planning Time: 5.734 ms
Execution Time: 35.722 ms

----------------------------------------------------------------------------------------------------------------------------

Hash Join  (cost=7.00..719.99 rows=1 width=4) (actual time=31.624..62.787 rows=2 loops=1)
 Hash Cond: (bag_items.item_id = items.id)
 ->  Nested Loop  (cost=1.71..712.93 rows=676 width=8) (actual time=0.046..62.539 rows=766 loops=1)
       ->  Nested Loop  (cost=1.28..519.52 rows=72 width=8) (actual time=0.038..23.533 rows=52 loops=1)
             ->  Nested Loop  (cost=0.85..432.16 rows=154 width=12) (actual time=0.026..0.460 rows=28 loops=1)
                   ->  Index Scan using index_company_users_on_company_id_and_deleted_at on company_users  (cost=0.43..173.33 rows=154 width=8) (actual time=0.014..0.091 rows=28 loops=1)
                         Index Cond: ((company_id = 1595) AND (deleted_at IS NULL))
                   ->  Index Only Scan using users_pkey on users  (cost=0.43..1.68 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=28)
                         Index Cond: (id = company_users.user_id)
                         Heap Fetches: 19
             ->  Index Scan using index_bags_on_user_id on bags  (cost=0.42..0.55 rows=2 width=8) (actual time=0.816..0.822 rows=2 loops=28)
                   Index Cond: (user_id = users.id)
                   Filter: ((NOT deleted) AND ((user_type)::text = 'User'::text))
                   Rows Removed by Filter: 0
       ->  Index Scan using index_bag_items_on_bag_id on bag_items  (cost=0.43..2.01 rows=68 width=8) (actual time=0.590..0.745 rows=15 loops=52)
             Index Cond: (bag_id = bags.id)
             Filter: (NOT deleted)
             Rows Removed by Filter: 9
 ->  Hash  (cost=5.26..5.26 rows=2 width=4) (actual time=0.086..0.087 rows=1 loops=1)
       Buckets: 1024  Batches: 1  Memory Usage: 9kB
       ->  Index Scan using index_items_on_slug on items  (cost=0.41..5.26 rows=2 width=4) (actual time=0.045..0.083 rows=1 loops=1)
             Index Cond: (slug = ANY ('{gray-hat,jeans}'::citext[]))
Planning Time: 1.560 ms
Execution Time: 62.878 ms

----------------------------------------------------------------------------------------------------------------------------

Sort  (cost=92.21..92.30 rows=34 width=2029) (actual time=0.291..0.294 rows=12 loops=1)
 Sort Key: pending_user
 Sort Method: quicksort  Memory: 37kB
 ->  Bitmap Heap Scan on users  (cost=52.21..91.35 rows=34 width=2029) (actual time=0.090..0.230 rows=12 loops=1)
       Recheck Cond: ((id = ANY ('{2,2,2,2,234196,234196,234196,234196,234196,234196,234196,234711,605774,648863,912565,912565,1123517,1123517,1123517,1097143,1097143,1097143,257253,257253,257253,257253,257253,257253,257253,1026609,968371,1107724}'::integer[])) OR (id = ANY ('{2,257253}'::integer[])))
       Heap Blocks: exact=12
       ->  BitmapOr  (cost=52.21..52.21 rows=34 width=0) (actual time=0.078..0.079 rows=0 loops=1)
             ->  Bitmap Index Scan on users_pkey  (cost=0.00..49.12 rows=32 width=0) (actual time=0.074..0.074 rows=13 loops=1)
                   Index Cond: (id = ANY ('{2,2,2,2,234196,234196,234196,234196,234196,234196,234196,234711,605774,648863,912565,912565,1123517,1123517,1123517,1097143,1097143,1097143,257253,257253,257253,257253,257253,257253,257253,1026609,968371,1107724}'::integer[]))
             ->  Bitmap Index Scan on users_pkey  (cost=0.00..3.07 rows=2 width=0) (actual time=0.004..0.004 rows=3 loops=1)
                   Index Cond: (id = ANY ('{2,257253}'::integer[]))
Planning Time: 1.106 ms
Execution Time: 0.448 ms

大查詢大約需要 1000 毫秒,而 3 個小查詢總共需要大約 100 毫秒!這包括應用程序和數據庫之間的網路來回。讓我感到困惑的是,前兩個小​​查詢剛剛從大查詢中提取出來,並添加了第三個查詢以將兩者結合起來,因此總體複雜性基本相同。

知道為什麼會這樣嗎?我正在考慮通過將其他類似的慢速大查詢分解為較小的查詢來提高它們的性能。

我認為一般來說使用一個大查詢會更好,因為它讓 Postgres 查詢規劃器可以完全控制優化,

不幸的是,它也給了規劃者一個可能犯的錯誤的世界。它必鬚根據可用的行估計來製定計劃(它還能做什麼?)並且可用的估計相差超過 70,000 倍。因此,它選擇了一個糟糕的計劃也就不足為奇了。儘管在這種情況下,由於查詢的複雜性,它可能無論如何都無法提出一個好的計劃。結合 OR 的兩個分支而不得到重複通常會使用點陣圖,但在這種情況下,您需要一個間接點陣圖,PostgreSQL 沒有實現。但是改進軟體的想法更適合開發者的郵件列表,而不是這個論壇。

但是,當您將其分解為較小的查詢時,您會按照與您的直覺相匹配的方式進行操作。您應該比其他人更了解您的數據。但是,計劃者無法訪問您的直覺。

嘗試如下重寫查詢:

SELECT users.*
FROM users
WHERE users.id = ANY (array(SELECT /* first subquery */) ||
                     array(SELECT /* second subquery */));
ORDER BY users.pending_user;

這應該強制執行類似於三個單獨查詢的執行計劃。

引用自:https://dba.stackexchange.com/questions/315801