Postgresql

更有效的分組來計算組中的項目

  • September 15, 2019

我的查詢是:

SELECT COUNT("EventType"."id") AS "eventCount", "EventType"."id" AS "EventType.id"
 FROM "events" AS "Event" 
 INNER JOIN "event_types" AS "EventType" ON "Event"."eventTypeId" = "EventType"."id"
 INNER JOIN "projects" AS "EventType->Project" ON "EventType"."projectId" = "EventType->Project"."id"
 WHERE "EventType->Project"."id" = 142
 GROUP BY "EventType"."id";

基本上我想知道,對於給定的項目,每種類型的事件發生了多少。

相關架構是:

                                                          Table "public.projects"
     Column       |           Type           |                             Modifiers
-------------------+--------------------------+---------------------------------------------------------
id                | integer                  | not null default nextval('projects_id_seq'::regclass) 
Indexes:
   "projects_pkey" PRIMARY KEY, btree (id)
Referenced by:
   TABLE "event_types" CONSTRAINT "event_types_projectId_fkey" FOREIGN KEY ("projectId") REFERENCES projects(id) ON UPDATE CASCADE ON DELETE CASCADE

                                                        Table "public.event_types"
   Column     |           Type           |                                 Modifiers
---------------+--------------------------+------------------------------------------------------------
id            | integer                  | not null default nextval('event_types_id_seq'::regclass)
projectId     | integer                  | not null
Indexes:
   "event_types_pkey" PRIMARY KEY, btree (id)
   "event_types_project_id" btree ("projectId")
Foreign-key constraints:
   "event_types_projectId_fkey" FOREIGN KEY ("projectId") REFERENCES projects(id) ON UPDATE CASCADE ON DELETE CASCADE
Referenced by:
   TABLE "events" CONSTRAINT "events_eventTypeId_fkey" FOREIGN KEY ("eventTypeId") REFERENCES event_types(id) ON UPDATE CASCADE ON DELETE CASCADE

                                             Table "public.events"
Column      |  Type   |                               Modifiers
-------------+---------+-------------------------------------------------------
id          | integer | not null default nextval('events_id_seq'::regclass)
eventTypeId | integer | not null
Indexes:
   "events_pkey" PRIMARY KEY, btree (id)
   "events_event_type_id" btree ("eventTypeId")
Foreign-key constraints:
   "events_eventTypeId_fkey" FOREIGN KEY ("eventTypeId") REFERENCES event_types(id) ON UPDATE CASCADE ON DELETE CASCADE

當我執行它時EXPLAIN ANALYZE,結果是:

   QUERY PLAN                                                                                                       
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=320957.49..320962.39 rows=490 width=12) (actual time=2612.748..2612.814 rows=459 loops=1)
  Group Key: "EventType".id
  ->  Hash Join  (cost=122.12..312038.18 rows=1783862 width=4) (actual time=386.978..2501.421 rows=690140 loops=1)
        Hash Cond: ("Event"."eventTypeId" = "EventType".id)
        ->  Seq Scan on events "Event"  (cost=0.00..239469.41 rows=14562141 width=4) (actual time=0.026..1272.817 rows=14558556 loops=1)
        ->  Hash  (cost=116.00..116.00 rows=490 width=4) (actual time=0.323..0.323 rows=459 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 25kB
              ->  Nested Loop  (cost=0.28..116.00 rows=490 width=4) (actual time=0.061..0.263 rows=459 loops=1)
                    ->  Seq Scan on projects "EventType->Project"  (cost=0.00..1.56 rows=1 width=4) (actual time=0.017..0.021 rows=1 loops=1)
                          Filter: (id = 142)
                          Rows Removed by Filter: 47
                    ->  Index Scan using event_types_project_id on event_types "EventType"  (cost=0.28..109.53 rows=490 width=8) (actual time=0.042..0.193 rows=459 loops=1)
                          Index Cond: ("projectId" = 142)
Planning time: 3.891 ms
Execution time: 2613.033 ms

它似乎正在掃描整個事件表(非常大),整個查詢需要相當長的時間。我原以為只需掃描索引就可以逃脫。我的想法是索引為每個索引鍵保持計數,但也許這種心理模型有缺陷。

有沒有辦法加快這種類型的查詢?如果不是,我可以自己跟踪計數,但如果可以通過在查詢或模式中修復一些看起來更容易的東西來簡化它。

SELECT et.id AS event_type_id, count(e."eventTypeId") AS event_count
FROM   event_types et 
LEFT   JOIN events e ON e."eventTypeId" = et.id
WHERE  et."projectId" = 142
GROUP  BY et.id;

要點

0.風格

如果可以,請避免使用 CaMeL 案例名稱,以使您和我們的生活更輕鬆。

至少在查詢中使用不需要雙引號的合法表別名。

1.正確性

您的查詢因其聲明的目的而遺漏了一個極端情況:

我想知道,對於給定的項目,每種類型的事件發生了多少。

[INNER] JOINto從結果中events排除所有具有 0 個事件的類型。通常,您需要LEFT [OUTER] JOIN, 來獲取事件類型的完整列表,包括具有 0 個事件的事件類型。

因此,請改為從表中計算一個非空列events。明顯的候選人是count(e."eventTypeId")。當(且僅當)沒有為該類型找到事件並且count()不計算 NULL 值時,該列為 NULL。

2. 性能

由於參照完整性是通過 FK 約束強制執行的,因此根本不需要涉及表projects。我們有id,這就是我們所需要的。

因此,將該WHERE子句修改為WHERE et."projectId" = 142。更短,更快。

3. 索引

理想情況下,將現有索引替換為event_types_project_id("projectId")的多列索引("projectId", id)。完全相同的尺寸,多重好處。我主要針對table 上**的僅索引掃描**event_types。看:

events以及您已經擁有的表中 (“eventTypeId”) 的索引( events_event_type_id)。

還:

我原以為只需掃描索引就可以逃脫。我的想法是索引為每個索引鍵保持計數,但也許這種心理模型有缺陷。

不,索引不計數。對於最常見的值等,只有內部統計數據。但是您可以**只掃描索引就可以逃脫- 如果滿足僅索引掃描的先決條件。

引用自:https://dba.stackexchange.com/questions/248688