如何更快地搜尋一個有 8000 萬條記錄的表？

June 25, 2020

我有一個包含大約 8000 萬條記錄的表，我想查找使用者有權訪問的列表和工作區的所有活動。因此，首先，我獲取列表和工作區的 ID，然後執行以下查詢：

select *, COALESCE("origin_created_at", "created_at") AS "created_at",
 COALESCE("updated_at", "origin_updated_at") AS "updated_at" 
from "activities" 
where ("listId" in (310,214088,219,220,271,222,28434,36046,43233,38236,
 1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,
 395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,
 587679,841589,722320,551,170392,421035,197071,632736,632742,632755,
 632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,
 58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,
 158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,
 643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,
 886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,
 226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,
 323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,
 267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,
 74018) 
or "workspaceId" in (137, 81, 111, 424284, 425935, 430658, 84, 163840, 
 3, 4, 281105, 57, 64642, 96660, 38739, 273574, 295312, 79, 213, 
 240478, 424760, 65, 36989)) 
and (("isBulk" = false or "activities"."type" = 0) 
      and "activities"."deprecated_at" is null) 
order by COALESCE("origin_created_at", "created_at") DESC, "id" desc
limit 40;

這是執行計劃

Limit  (cost=2446886.55..2446886.65 rows=40 width=1002) (actual time=44452.393..44452.418 rows=40 loops=1)
  -&gt;  Sort  (cost=2446886.55..2449439.67 rows=1021250 width=1002) (actual time=44452.391..44452.401 rows=40 loops=1)
        Sort Key: (COALESCE(origin_created_at, created_at)) DESC, id DESC
        Sort Method: top-N heapsort  Memory: 37kB
        -&gt;  Bitmap Heap Scan on activities  (cost=37546.04..2414605.20 rows=1021250 width=1002) (actual time=1043.663..43916.385 rows=568891 loops=1)
              Recheck Cond: (("listId" = ANY ('{310,214088,219,220,271,222,28434,36046,43233,38236,1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,587679,841589,722320,551,170392,421035,197071,632736,632742,632755,632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,74018}'::integer[])) OR ("workspaceId" = ANY ('{137,81,111,424284,425935,430658,84,163840,3,4,281105,57,64642,96660,38739,273574,295312,79,213,240478,424760,65,36989}'::integer[])))
              Rows Removed by Index Recheck: 9072392
              Filter: ((deprecated_at IS NULL) AND ((NOT "isBulk") OR (type = 0)))
              Rows Removed by Filter: 113630
              Heap Blocks: exact=41259 lossy=271838
              -&gt;  BitmapOr  (cost=37546.04..37546.04 rows=1350377 width=0) (actual time=1032.769..1032.769 rows=0 loops=1)
                    -&gt;  Bitmap Index Scan on activities_list_id_index  (cost=0.00..17333.10 rows=617933 width=0) (actual time=118.412..118.412 rows=507019 loops=1)
                          Index Cond: ("listId" = ANY ('{310,214088,219,220,271,222,28434,36046,43233,38236,1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,587679,841589,722320,551,170392,421035,197071,632736,632742,632755,632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,74018}'::integer[]))
                    -&gt;  Bitmap Index Scan on activities_workspace_id_index  (cost=0.00..19702.32 rows=732444 width=0) (actual time=914.355..914.355 rows=682628 loops=1)
                          Index Cond: ("workspaceId" = ANY ('{137,81,111,424284,425935,430658,84,163840,3,4,281105,57,64642,96660,38739,273574,295312,79,213,240478,424760,65,36989}'::integer[]))
Planning time: 2.882 ms
Execution time: 44452.871 ms
(17 rows)

如計劃中所述，PostgreSQL 使用“點陣圖堆掃描”來掃描使查詢變慢的活動，儘管這兩個列都已編入索引。表上總共有 4 個索引，以下列各一個：type、listId、workspaceId、organizationId。

如何使查詢更快？還是有更好的方法來重寫查詢？

如果您增加該查詢將變得更快work_mem（因為那時將不再有“有損”塊）。
id首先從一個表中選擇所有s ，然後根據這些 s 從另一個表中選擇行的想法從id根本上是錯誤的。您應該改為連接兩個表並使用單個查詢執行相同的工作。

對於您的查詢…
一個非常小的LIMIT 40
而不是很有選擇性的WHERE條件①
…這個部分的、多列的表達式索引可能會產生奇蹟：
CREATE INDEX foo ON activities (COALESCE(origin_created_at, created_at) DESC, id DESC)
WHERE ("isBulk" = false OR type = 0) AND deprecated_at IS NULL;
① 目前，在做了很多不必要的工作後，大部分是由於您的work_mem設置不足，請參閱：
帶有點陣圖索引掃描的查詢計劃中的“重新檢查條件：”行
…在Bitmap Heap Scan、Recheck和Filter步驟rows=568891之後獲得資格。排序後，僅返回其中的 40 個（！）。平均而言，在 8000 萬行中，每 140 行符合條件。
使用新索引，Postgres 可以只遍歷與查詢排序順序匹配的索引，直到找到 40 個符合條件的行。Postgres 必須平均讀取140 * 40 = 5600行。應該快得多。
附加"listId"和"workspaceId"索引可能會有所幫助。使索引更大（壞），但我們 Postgres 可以在進入堆之前過濾索引元組，其中可能仍然需要過濾一些奇怪的死元組，從而將堆訪問減少到絕對最小值：
CREATE INDEX foo ON activities (COALESCE(origin_created_at, created_at) DESC, id DESC, "listId"`, `"workspaceId")
WHERE ("isBulk" = false OR type = 0) AND deprecated_at IS NULL;
如果您不講述整個故事，所有這一切都可能會落空。
有關的：
Postgres 有時使用劣質索引來表示 WHERE a IN (…) ORDER BY b LIMIT N
過濾數組文本$$ $$並按時間戳排序
空間索引能否幫助“範圍-按-限制”查詢
旁白
COALESCE(updated_at, origin_updated_at) AS updated_at? 那不應該COALESCE(origin_updated_at, updated_at) AS updated_at符合邏輯COALESCE(origin_created_at, created_at) AS created_at嗎？
考慮在 Postgres 中使用不帶雙引號的合法小寫標識符，以使您的生活更輕鬆。

引用自：https://dba.stackexchange.com/questions/269775

如何更快地搜尋一個有 8000 萬條記錄的表？

旁白

相關問答

優化查詢以在分區表中跨多天獲取數據

具有大 IN 的 Postgres 查詢，並且在臨時表上加入似乎不起作用

優化對 690,000 行表的昂貴的 GROUP BY / ORDER BY 查詢

如何使用 OR 子句優化帶有時間戳的查詢？

具有大 IN 和數百萬條記錄的優化選擇結果

提高 COUNT、JOIN 和 GROUP BY 查詢性能