Postgresql：“分組依據”的結果非常慢

April 27, 2018

我正在嘗試修復程式碼檢查器上的一些嚴重性能問題。此處報告的問題： https ://github.com/Ericsson/codechecker/issues/1399#issuecomment-368862492

表格如下：

                                      Table "public.reports"
     Column      |            Type             |                      Modifiers                       
------------------+-----------------------------+------------------------------------------------------
id               | integer                     | not null default nextval('reports_id_seq'::regclass)
file_id          | integer                     | 
run_id           | integer                     | 
bug_id           | character varying           | 
checker_id       | character varying           | 
checker_cat      | character varying           | 
bug_type         | character varying           | 
severity         | integer                     | 
line             | integer                     | 
column           | integer                     | 
checker_message  | character varying           | 
detection_status | detection_status            | 
detected_at      | timestamp without time zone | not null
fixed_at         | timestamp without time zone | 
Indexes:
   "pk_reports" PRIMARY KEY, btree (id)
   "ix_reports_bug_id" btree (bug_id)
   "ix_reports_file_id" btree (file_id)
   "ix_reports_run_id" btree (run_id)
Foreign-key constraints:
   "fk_reports_file_id_files" FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
   "fk_reports_run_id_runs" FOREIGN KEY (run_id) REFERENCES runs(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
Referenced by:
   TABLE "bug_path_events" CONSTRAINT "fk_bug_path_events_report_id_reports" FOREIGN KEY (report_id) REFERENCES reports(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
   TABLE "bug_report_points" CONSTRAINT "fk_bug_report_points_report_id_reports" FOREIGN KEY (report_id) REFERENCES reports(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED

您會注意到缺少索引。

以下查詢需要一段時間才能執行（26 秒），表中有 2 469 648 個元素。

SELECT reports.run_id AS reports_run_id,
 reports.detection_status AS reports_detection_status,
 count(DISTINCT reports.bug_id) AS count_1
FROM reports
GROUP BY reports.run_id, reports.detection_status

和

GroupAggregate  (cost=566298.15..590994.88 rows=25 width=41) (actual time=6782.100..26183.127 rows=28 loops=1)
  Group Key: run_id, detection_status
  -&gt;  Sort  (cost=566298.15..572472.27 rows=2469648 width=41) (actual time=6768.582..9561.902 rows=2469648 loops=1)
        Sort Key: run_id, detection_status
        Sort Method: external merge  Disk: 123104kB
        -&gt;  Seq Scan on reports  (cost=0.00..152130.48 rows=2469648 width=41) (actual time=0.010..3248.899 rows=2469648 loops=1)
Planning time: 0.066 ms
Execution time: 26224.844 ms
(8 rows)

知道如何改進查詢或缺失索引嗎？謝謝！

在我手中，這個查詢在模擬數據集上的速度比你觀察到的快十倍，即使在非常糟糕的硬體上也是如此。它具有相同的EXPLAIN計劃和相似的估計成本，只是執行速度更快。所以要麼發生了一些非常奇怪的事情，要麼你的伺服器嚴重超載。調查可能在 PostgreSQL 之外的最佳選擇，例如使用top, perf,vmstat等。
就索引而言，最適合此查詢的索引是：
create index on reports (run_id , detection_status, bug_id);
這允許它使用僅索引掃描來完成查詢，並且在我手中快了 4 到 5 倍。但我不知道這將如何轉化為您的伺服器，因為它似乎具有未診斷的性能問題。此外，您需要保持reports良好的真空狀態才能有效，否則掃描的僅索引性質將失去其優勢。您可能需要修改表 autovac 參數以使其被充分清理，或設置您自己的手動清理腳本。

知道如何改進查詢或缺失索引嗎？謝謝！
您已經在reports.run_id. 它未使用，這意味著其中之一
資源問題？（配置/記憶體/主機負載等）
使索引不適合查詢的數據分佈（在某些情況下，索引無濟於事）。
要測試索引是否有幫助，您可以重複查詢，並將順序掃描首選項設置為零：
SET enable_seqscan TO false;
/* repeat - your - query - here */
為了獲得更好的效果，請發布ANALYZE VERBOSE reports和的結果SELECT * FROM pg_stats WHERE tablename = 'reports'。還要讓我們知道數據集大小和資源配置參數（work_mem, shared_buffers）以及數據庫主機參數（作業系統、記憶體、磁碟）。

引用自：https://dba.stackexchange.com/questions/203343

Postgresql：“分組依據”的結果非常慢

相關問答

PostgreSQL 可以使用索引來加快計數（不同）查詢嗎？

GROUP BY，但每個使用者只使用一行

避免由 DATE_TRUNC 引起的對 Indexed Group By 排序

兩列不同，如何擺脫“重複”

提高大型 PostgresSQL 表中 COUNT/GROUP-BY 的性能？

高效部分 DISTINCT ON