多表 GROUP BY 查詢優化

February 13, 2015

目前我有一個GROUP BY非常慢的多表/列查詢。以下是查詢使用的表架構：

報告表

                                                               Table "report"
      Column       |           Type           |                           Modifiers                           | Storage  | Stats target | Description 
--------------------+--------------------------+---------------------------------------------------------------+----------+--------------+-------------
id                 | integer                  | not null default nextval('dashboard_report_id_seq'::regclass) | plain    |              | 
county             | character varying(200)   | not null                                                      | extended |              | 
district           | character varying(200)   | not null                                                      | extended |              | 
report_name        | character varying(500)   | not null                                                      | extended |              | 
Indexes:
   "dashboard_report_pkey" PRIMARY KEY, btree (id)

問題表

                                                      Table "question"
  Column    |          Type          |                            Modifiers                            | Storage  | Stats target | Description 
-------------+------------------------+-----------------------------------------------------------------+----------+--------------+-------------
id          | integer                | not null default nextval('dashboard_question_id_seq'::regclass) | plain    |              | 
name        | character varying(255) | not null                                                        | extended |              | 
label       | character varying(255) | not null                                                        | extended |              | 
report_type | character varying(255) | not null                                                        | extended |              | 
Indexes:
   "dashboard_question_pkey" PRIMARY KEY, btree (id)

問答表

                                                          Table "questionanswer"
  Column    |          Type          |                               Modifiers                               | Storage  | Stats target | Description 
-------------+------------------------+-----------------------------------------------------------------------+----------+--------------+-------------
id          | integer                | not null default nextval('dashboard_questionanswer_id_seq'::regclass) | plain    |              | 
answer      | character varying(255) | not null                                                              | extended |              | 
question_id | integer                | not null                                                              | plain    |              | 
report_id   | integer                | not null                                                              | plain    |              | 
Indexes:
   "dashboard_questionanswer_pkey" PRIMARY KEY, btree (id)
   "dashboard_questionanswer_6f78b20c" btree (report_id)
   "dashboard_questionanswer_7aa0f6ee" btree (question_id)
Foreign-key constraints:
   "dashboard_que_report_id_4c2a87ee585b6121_fk_dashboard_report_id" FOREIGN KEY (report_id) REFERENCES dashboard_report(id) DEFERRABLE INITIALLY DEFERRED
   "dashboard_question_id_522f82097923c241_fk_dashboard_question_id" FOREIGN KEY (question_id) REFERENCES dashboard_question(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no

詢問：

SELECT 
   "report"."county", 
   "report"."district", 
   "report"."report_name", 
   "question"."report_type", 
   COUNT(DISTINCT "questionanswer"."report_id")
FROM "questionanswer" 
   INNER JOIN "question" ON ( "questionanswer"."question_id" = "question"."id" )
   INNER JOIN "report" ON ( "questionanswer"."report_id" = "report"."id" )
WHERE 
   "question"."name" = 'touch' 
GROUP BY 
   "report"."county", 
   "report"."district", 
   "report"."report_name", 
   "question"."report_type" ;

我不確定從哪裡開始優化此查詢，目前它在我的系統上相當慢〜 300 毫秒。我確實嘗試單獨索引子句引用的所有列，GROUP BY但這也無助於查詢時間。

所以基本上使用者送出報告，其數據儲存在報告中，每個報告有多個 QuestionAnswers，儲存在 QuestionAnswer 表中。問題元數據位於問題表中。QuestionAnswer 表記錄會很大，接下來是 Report 表數據，Question 表有一組只有 40 個問題。查詢的目的是找出針對特定問題的每個縣、區和報告類型，我們有多少報告回答了該問題。

我的 Postgres Sql 版本是 PostgreSQL 9.3.5。

查詢計劃在這裡。

我的機器硬體規格是 MacOSX 10.9.5 4-Core 8 GM RAM SSD，加密文件系統。

GROUP BY會索引幫助引用的所有列嗎？

還是我應該選擇多列索引？

我確實想知道，為什麼你有問題report_type的as 屬性？儘管如此，您的目標是：
查詢的目的是找出針對特定問題的每個縣、區和報告類型，我們有多少報告回答了該問題。
為什麼要包括report_name在GROUP BY步驟中？這與您的定義相衝突。我認為你應該刪除它：
SELECT r.county, r.district, q.report_type
    , count(DISTINCT r.id) AS reports
FROM   question q 
JOIN   questionanswer qa ON qa.question_id = q.id
JOIN   report         r  ON qa.report_id = r.id
WHERE  q.name = 'touch' 
GROUP  BY 1,2,3;
此外，只要您將查詢限制為單個問題，每個定義的結果中只有一個問題。 report_type將其包含在結果和GROUP BY子句中不會更改數字。
至於性能：要麼創建一個**UNIQUE約束**（按(question_id, report_id)那個順序！），就像我懷疑你應該有的那樣：
ALTER TABLE questionanswer ADD CONSTRAINT qa_uni UNIQUE (question_id, report_id);
或者，除此之外，至少在(question_id, report_id).
為什麼索引/約束中的列順序很重要？
複合索引是否也適用於第一個欄位的查詢？
有了UNIQUE約束，查詢變得相當便宜：
    , count(*) AS reports
只要你只有40 questions你不需要索引question.name，但只要你選擇問題name，你應該仍然UNIQUE對該列有約束。
剩下的就靠PKreport了。
如果您真的想為每個問題計算不同的縣和區，相關查詢：
SELECT q.id, q.name, q.report_type
    , count(DISTINCT r.county)      AS distinct_counties
    , count(DISTINCT r.district)    AS distinct_districts
FROM   question q 
JOIN   questionanswer qa ON qa.question_id = q.id
JOIN   report         r  ON qa.report_id = r.id
WHERE  q.name = 'touch' 
GROUP  BY 1;  -- the PK column covers the whole table

引用自：https://dba.stackexchange.com/questions/91701

多表 GROUP BY 查詢優化

相關問答

列的數據類型會影響查詢性能嗎？

Postgres：性能：慢速索引掃描

Postgres 不使用 group by 的表達式索引

如何優化索引列上的 IN 查詢

我應該將時間戳部分拆分為單獨的列嗎？

空間索引能否幫助“範圍-按-限制”查詢