Postgresql

Postgresql:“分組依據”的結果非常慢

  • April 27, 2018

我正在嘗試修復程式碼檢查器上的一些嚴重性能問題。此處報告的問題: https ://github.com/Ericsson/codechecker/issues/1399#issuecomment-368862492

表格如下:

                                      Table "public.reports"
     Column      |            Type             |                      Modifiers                       
------------------+-----------------------------+------------------------------------------------------
id               | integer                     | not null default nextval('reports_id_seq'::regclass)
file_id          | integer                     | 
run_id           | integer                     | 
bug_id           | character varying           | 
checker_id       | character varying           | 
checker_cat      | character varying           | 
bug_type         | character varying           | 
severity         | integer                     | 
line             | integer                     | 
column           | integer                     | 
checker_message  | character varying           | 
detection_status | detection_status            | 
detected_at      | timestamp without time zone | not null
fixed_at         | timestamp without time zone | 
Indexes:
   "pk_reports" PRIMARY KEY, btree (id)
   "ix_reports_bug_id" btree (bug_id)
   "ix_reports_file_id" btree (file_id)
   "ix_reports_run_id" btree (run_id)
Foreign-key constraints:
   "fk_reports_file_id_files" FOREIGN KEY (file_id) REFERENCES files(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
   "fk_reports_run_id_runs" FOREIGN KEY (run_id) REFERENCES runs(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
Referenced by:
   TABLE "bug_path_events" CONSTRAINT "fk_bug_path_events_report_id_reports" FOREIGN KEY (report_id) REFERENCES reports(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
   TABLE "bug_report_points" CONSTRAINT "fk_bug_report_points_report_id_reports" FOREIGN KEY (report_id) REFERENCES reports(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED

您會注意到缺少索引。

以下查詢需要一段時間才能執行(26 秒),表中有 2 469 648 個元素。

SELECT reports.run_id AS reports_run_id,
 reports.detection_status AS reports_detection_status,
 count(DISTINCT reports.bug_id) AS count_1
FROM reports
GROUP BY reports.run_id, reports.detection_status

GroupAggregate  (cost=566298.15..590994.88 rows=25 width=41) (actual time=6782.100..26183.127 rows=28 loops=1)
  Group Key: run_id, detection_status
  ->  Sort  (cost=566298.15..572472.27 rows=2469648 width=41) (actual time=6768.582..9561.902 rows=2469648 loops=1)
        Sort Key: run_id, detection_status
        Sort Method: external merge  Disk: 123104kB
        ->  Seq Scan on reports  (cost=0.00..152130.48 rows=2469648 width=41) (actual time=0.010..3248.899 rows=2469648 loops=1)
Planning time: 0.066 ms
Execution time: 26224.844 ms
(8 rows)

知道如何改進查詢或缺失索引嗎?謝謝!

在我手中,這個查詢在模擬數據集上的速度比你觀察到的快十倍,即使在非常糟糕的硬體上也是如此。它具有相同的EXPLAIN計劃和相似的估計成本,只是執行速度更快。所以要麼發生了一些非常奇怪的事情,要麼你的伺服器嚴重超載。調查可能在 PostgreSQL 之外的最佳選擇,例如使用top, perf,vmstat等。

就索引而言,最適合此查詢的索引是:

create index on reports (run_id , detection_status, bug_id);

這允許它使用僅索引掃描來完成查詢,並且在我手中快了 4 到 5 倍。但我不知道這將如何轉化為您的伺服器,因為它似乎具有未診斷的性能問題。此外,您需要保持reports良好的真空狀態才能有效,否則掃描的僅索引性質將失去其優勢。您可能需要修改表 autovac 參數以使其被充分清理,或設置您自己的手動清理腳本。

知道如何改進查詢或缺失索引嗎?謝謝!

您已經在reports.run_id. 它未使用,這意味著其中之一

  • 資源問題?(配置/記憶體/主機負載等)
  • 使索引不適合查詢的數據分佈(在某些情況下,索引無濟於事)。

要測試索引是否有幫助,您可以重複查詢,並將順序掃描首選項設置為零:

SET enable_seqscan TO false;
/* repeat - your - query - here */

為了獲得更好的效果,請發布ANALYZE VERBOSE reports和的結果SELECT * FROM pg_stats WHERE tablename = 'reports'。還要讓我們知道數據集大小和資源配置參數(work_mem, shared_buffers)以及數據庫主機參數(作業系統、記憶體、磁碟)。

引用自:https://dba.stackexchange.com/questions/203343