Postgresql
Postgresql:“分組依據”的結果非常慢
我正在嘗試修復程式碼檢查器上的一些嚴重性能問題。此處報告的問題: https ://github.com/Ericsson/codechecker/issues/1399#issuecomment-368862492
表格如下:
Table "public.reports" Column | Type | Modifiers ------------------+-----------------------------+------------------------------------------------------ id | integer | not null default nextval('reports_id_seq'::regclass) file_id | integer | run_id | integer | bug_id | character varying | checker_id | character varying | checker_cat | character varying | bug_type | character varying | severity | integer | line | integer | column | integer | checker_message | character varying | detection_status | detection_status | detected_at | timestamp without time zone | not null fixed_at | timestamp without time zone | Indexes: "pk_reports" PRIMARY KEY, btree (id) "ix_reports_bug_id" btree (bug_id) "ix_reports_file_id" btree (file_id) "ix_reports_run_id" btree (run_id) Foreign-key constraints: "fk_reports_file_id_files" FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED "fk_reports_run_id_runs" FOREIGN KEY (run_id) REFERENCES runs(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED Referenced by: TABLE "bug_path_events" CONSTRAINT "fk_bug_path_events_report_id_reports" FOREIGN KEY (report_id) REFERENCES reports(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED TABLE "bug_report_points" CONSTRAINT "fk_bug_report_points_report_id_reports" FOREIGN KEY (report_id) REFERENCES reports(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
您會注意到缺少索引。
以下查詢需要一段時間才能執行(26 秒),表中有 2 469 648 個元素。
SELECT reports.run_id AS reports_run_id, reports.detection_status AS reports_detection_status, count(DISTINCT reports.bug_id) AS count_1 FROM reports GROUP BY reports.run_id, reports.detection_status
和
GroupAggregate (cost=566298.15..590994.88 rows=25 width=41) (actual time=6782.100..26183.127 rows=28 loops=1) Group Key: run_id, detection_status -> Sort (cost=566298.15..572472.27 rows=2469648 width=41) (actual time=6768.582..9561.902 rows=2469648 loops=1) Sort Key: run_id, detection_status Sort Method: external merge Disk: 123104kB -> Seq Scan on reports (cost=0.00..152130.48 rows=2469648 width=41) (actual time=0.010..3248.899 rows=2469648 loops=1) Planning time: 0.066 ms Execution time: 26224.844 ms (8 rows)
知道如何改進查詢或缺失索引嗎?謝謝!
在我手中,這個查詢在模擬數據集上的速度比你觀察到的快十倍,即使在非常糟糕的硬體上也是如此。它具有相同的
EXPLAIN
計劃和相似的估計成本,只是執行速度更快。所以要麼發生了一些非常奇怪的事情,要麼你的伺服器嚴重超載。調查可能在 PostgreSQL 之外的最佳選擇,例如使用top
,perf
,vmstat
等。就索引而言,最適合此查詢的索引是:
create index on reports (run_id , detection_status, bug_id);
這允許它使用僅索引掃描來完成查詢,並且在我手中快了 4 到 5 倍。但我不知道這將如何轉化為您的伺服器,因為它似乎具有未診斷的性能問題。此外,您需要保持
reports
良好的真空狀態才能有效,否則掃描的僅索引性質將失去其優勢。您可能需要修改表 autovac 參數以使其被充分清理,或設置您自己的手動清理腳本。
知道如何改進查詢或缺失索引嗎?謝謝!
您已經在
reports.run_id
. 它未使用,這意味著其中之一
- 資源問題?(配置/記憶體/主機負載等)
- 使索引不適合查詢的數據分佈(在某些情況下,索引無濟於事)。
要測試索引是否有幫助,您可以重複查詢,並將順序掃描首選項設置為零:
SET enable_seqscan TO false; /* repeat - your - query - here */
為了獲得更好的效果,請發布
ANALYZE VERBOSE reports
和的結果SELECT * FROM pg_stats WHERE tablename = 'reports'
。還要讓我們知道數據集大小和資源配置參數(work_mem
,shared_buffers
)以及數據庫主機參數(作業系統、記憶體、磁碟)。