多表 GROUP BY 查詢優化
目前我有一個
GROUP BY
非常慢的多表/列查詢。以下是查詢使用的表架構:報告表
Table "report" Column | Type | Modifiers | Storage | Stats target | Description --------------------+--------------------------+---------------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('dashboard_report_id_seq'::regclass) | plain | | county | character varying(200) | not null | extended | | district | character varying(200) | not null | extended | | report_name | character varying(500) | not null | extended | | Indexes: "dashboard_report_pkey" PRIMARY KEY, btree (id)
問題表
Table "question" Column | Type | Modifiers | Storage | Stats target | Description -------------+------------------------+-----------------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('dashboard_question_id_seq'::regclass) | plain | | name | character varying(255) | not null | extended | | label | character varying(255) | not null | extended | | report_type | character varying(255) | not null | extended | | Indexes: "dashboard_question_pkey" PRIMARY KEY, btree (id)
問答表
Table "questionanswer" Column | Type | Modifiers | Storage | Stats target | Description -------------+------------------------+-----------------------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('dashboard_questionanswer_id_seq'::regclass) | plain | | answer | character varying(255) | not null | extended | | question_id | integer | not null | plain | | report_id | integer | not null | plain | | Indexes: "dashboard_questionanswer_pkey" PRIMARY KEY, btree (id) "dashboard_questionanswer_6f78b20c" btree (report_id) "dashboard_questionanswer_7aa0f6ee" btree (question_id) Foreign-key constraints: "dashboard_que_report_id_4c2a87ee585b6121_fk_dashboard_report_id" FOREIGN KEY (report_id) REFERENCES dashboard_report(id) DEFERRABLE INITIALLY DEFERRED "dashboard_question_id_522f82097923c241_fk_dashboard_question_id" FOREIGN KEY (question_id) REFERENCES dashboard_question(id) DEFERRABLE INITIALLY DEFERRED Has OIDs: no
詢問:
SELECT "report"."county", "report"."district", "report"."report_name", "question"."report_type", COUNT(DISTINCT "questionanswer"."report_id") FROM "questionanswer" INNER JOIN "question" ON ( "questionanswer"."question_id" = "question"."id" ) INNER JOIN "report" ON ( "questionanswer"."report_id" = "report"."id" ) WHERE "question"."name" = 'touch' GROUP BY "report"."county", "report"."district", "report"."report_name", "question"."report_type" ;
我不確定從哪裡開始優化此查詢,目前它在我的系統上相當慢〜 300 毫秒。我確實嘗試單獨索引子句引用的所有列,
GROUP BY
但這也無助於查詢時間。所以基本上使用者送出報告,其數據儲存在報告中,每個報告有多個 QuestionAnswers,儲存在 QuestionAnswer 表中。問題元數據位於問題表中。QuestionAnswer 表記錄會很大,接下來是 Report 表數據,Question 表有一組只有 40 個問題。查詢的目的是找出針對特定問題的每個縣、區和報告類型,我們有多少報告回答了該問題。
我的 Postgres Sql 版本是 PostgreSQL 9.3.5。
查詢計劃在這裡。
我的機器硬體規格是 MacOSX 10.9.5 4-Core 8 GM RAM SSD,加密文件系統。
GROUP BY
會索引幫助引用的所有列嗎?還是我應該選擇多列索引?
我確實想知道,為什麼你有問題
report_type
的as 屬性? 儘管如此,您的目標是:查詢的目的是找出針對特定問題的每個縣、區和報告類型,我們有多少報告回答了該問題。
為什麼要包括
report_name
在GROUP BY
步驟中?這與您的定義相衝突。我認為你應該刪除它:SELECT r.county, r.district, q.report_type , count(DISTINCT r.id) AS reports FROM question q JOIN questionanswer qa ON qa.question_id = q.id JOIN report r ON qa.report_id = r.id WHERE q.name = 'touch' GROUP BY 1,2,3;
此外,只要您將查詢限制為單個問題,每個定義的結果中只有一個問題。
report_type
將其包含在結果和GROUP BY
子句中不會更改數字。至於性能:要麼創建一個**
UNIQUE
約束**(按(question_id, report_id)
那個順序!),就像我懷疑你應該有的那樣:ALTER TABLE questionanswer ADD CONSTRAINT qa_uni UNIQUE (question_id, report_id);
或者,除此之外,至少在
(question_id, report_id)
.為什麼索引/約束中的列順序很重要?
有了
UNIQUE
約束,查詢變得相當便宜:, count(*) AS reports
只要你只有
40 questions
你不需要索引question.name
,但只要你選擇問題name
,你應該仍然UNIQUE
對該列有約束。剩下的就靠PK
report
了。如果您真的想為每個問題計算不同的縣和區,相關查詢:
SELECT q.id, q.name, q.report_type , count(DISTINCT r.county) AS distinct_counties , count(DISTINCT r.district) AS distinct_districts FROM question q JOIN questionanswer qa ON qa.question_id = q.id JOIN report r ON qa.report_id = r.id WHERE q.name = 'touch' GROUP BY 1; -- the PK column covers the whole table