PostgreSQL 9.3 中按週時間戳範圍內的查詢性能不佳

March 4, 2019

我有一個緩慢的查詢，它會生成過去一年每週的帳戶活動報告。該表目前有近 500 萬行，此查詢目前需要 8 秒才能執行。（目前）瓶頸是對時間戳範圍的順序掃描。

account=&gt; EXPLAIN ANALYZE SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;

GroupAggregate  (cost=450475.76..513465.44 rows=2290534 width=12) (actual time=7524.474..8003.291 rows=52 loops=1)
  Group Key: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
  -&gt;  Sort  (cost=450475.76..456202.09 rows=2290534 width=12) (actual time=7519.053..7691.924 rows=2314164 loops=1)
        Sort Key: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
        Sort Method: external sort  Disk: 40704kB
        -&gt;  Seq Scan on account_history  (cost=0.00..169364.81 rows=2290534 width=12) (actual time=1470.438..6222.076 rows=2314164 loops=1)
              Filter: ((event_time &lt;= now()) AND (event_time &gt;= (now() - '357 days'::interval)))
              Rows Removed by Filter: 2591679
Planning time: 0.126 ms
Execution time: 8011.160 ms

桌子：

account=&gt; \d account_history
                   Table "public.account_history"
  Column    |            Type             |         Modifiers
-------------+-----------------------------+---------------------------
account     | integer                     | not null
event_code  | text                        | not null
event_time  | timestamp without time zone | not null default now()
description | text                        | not null default ''::text
Indexes:
   "account_history_idx" btree (account, event_time DESC)
   "account_id_idx" btree (account, event_code, event_time)
Foreign-key constraints:
   "account_fk" FOREIGN KEY (account) REFERENCES account(id) ON UPDATE CASCADE ON DELETE RESTRICT
   "event_code_fk" FOREIGN KEY (event_code) REFERENCES domain_account_event(code) ON UPDATE CASCADE ON DELETE RESTRICT

當我最初創建此表時，我將時間戳列添加為 btree 索引的一部分，但我認為順序掃描是由於表中的（當時）行數較少（請參閱相關問題）。

但是，現在表已經增長到數百萬，我注意到查詢的性能問題，並發現查詢中沒有使用索引。

我嘗試按照這裡的建議添加一個有序索引，但這顯然也沒有在執行計劃中使用。

有沒有更好的方法來索引這個表，或者我的查詢中是否有一些內在的東西繞過了這兩個索引？

更新：當我僅在時間戳上添加索引時，將使用該索引。然而，它只減少了 25% 的執行時間：

account=&gt; CREATE INDEX account_history_time_idx ON account_history (event_time DESC);

account=&gt; EXPLAIN ANALYZE VERBOSE SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;

GroupAggregate  (cost=391870.30..454870.16 rows=2290904 width=12) (actual time=5481.930..6104.838 rows=52 loops=1)
  Output: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date), count(DISTINCT account)
  Group Key: ((to_timestamp(to_char(date_trunc('week'::text, account_history.event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
  -&gt;  Sort  (cost=391870.30..397597.56 rows=2290904 width=12) (actual time=5474.181..5771.903 rows=2314038 loops=1)
        Output: ((to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date), account
        Sort Key: ((to_timestamp(to_char(date_trunc('week'::text, account_history.event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
        Sort Method: external merge  Disk: 40688kB
        -&gt;  Index Scan using account_history_time_idx on public.account_history  (cost=0.44..110710.59 rows=2290904 width=12) (actual time=0.108..4352.143 rows=2314038 loops=1)
              Output: (to_timestamp(to_char(date_trunc('week'::text, event_time), 'IYYY-IW'::text), 'IYYY-IW'::text))::date, account
              Index Cond: ((account_history.event_time &gt;= (now() - '357 days'::interval)) AND (account_history.event_time &lt;= now()))
Planning time: 0.204 ms
Execution time: 6112.832 ms

https://explain.depesz.com/s/PSfU

我也按照這裡VACUUM FULL的建議進行了嘗試，但執行時間沒有區別。

以下是針對同一張表的一些更簡單查詢的執行計劃：

簡單地計算行數需要 0.5 秒：

account=&gt; EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM account_history;

Aggregate  (cost=97401.04..97401.05 rows=1 width=0) (actual time=551.179..551.179 rows=1 loops=1)
  Output: count(*)
  -&gt;  Seq Scan on public.account_history  (cost=0.00..85136.43 rows=4905843 width=0) (actual time=0.039..344.675 rows=4905843 loops=1)
        Output: account, event_code, event_time, description
Planning time: 0.075 ms
Execution time: 551.209 ms

並且使用相同的時間範圍子句只需不到一秒鐘：

account=&gt; EXPLAIN ANALYZE VERBOSE SELECT COUNT(*) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now();

Aggregate  (cost=93527.57..93527.58 rows=1 width=0) (actual time=997.436..997.436 rows=1 loops=1)
  Output: count(*)
  -&gt;  Index Only Scan using account_history_time_idx on public.account_history  (cost=0.44..87800.45 rows=2290849 width=0) (actual time=0.100..897.776 rows=2313987 loops=1)
        Output: event_time
        Index Cond: ((account_history.event_time &gt;= (now() - '357 days'::interval)) AND (account_history.event_time &lt;= now()))
        Heap Fetches: 2313987
Planning time: 0.239 ms
Execution time: 997.473 ms

根據評論，我嘗試了一種簡化的查詢形式：

account=&gt; EXPLAIN ANALYZE VERBOSE SELECT date_trunc('week', event_time) AS date, count(DISTINCT account) FROM account_history
WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;

GroupAggregate  (cost=374676.22..420493.00 rows=2290839 width=12) (actual time=2475.556..3078.191 rows=52 loops=1)
  Output: (date_trunc('week'::text, event_time)), count(DISTINCT account)
  Group Key: (date_trunc('week'::text, account_history.event_time))
  -&gt;  Sort  (cost=374676.22..380403.32 rows=2290839 width=12) (actual time=2468.654..2763.739 rows=2313977 loops=1)
        Output: (date_trunc('week'::text, event_time)), account
        Sort Key: (date_trunc('week'::text, account_history.event_time))
        Sort Method: external merge  Disk: 49720kB
        -&gt;  Index Scan using account_history_time_idx on public.account_history  (cost=0.44..93527.35 rows=2290839 width=12) (actual time=0.094..1537.488 rows=2313977 loops=1)
              Output: date_trunc('week'::text, event_time), account
              Index Cond: ((account_history.event_time &gt;= (now() - '357 days'::interval)) AND (account_history.event_time &lt;= now()))
Planning time: 0.220 ms
Execution time: 3086.828 ms
(12 rows)

account=&gt; SELECT date_trunc('week', current_date) AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWE
EN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
         date          | count
------------------------+-------
2017-10-23 00:00:00-04 |   132
(1 row)

事實上，這將執行時間減少了一半，但不幸的是並沒有給出預期的結果，如下所示：

account=&gt; SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
   date    | count
------------+-------
2016-10-31 |    14
...
2017-10-23 |   584
(52 rows)

如果我能找到一種更便宜的方法來按週匯總這些記錄，那將大大有助於解決這個問題。

我願意接受有關使用該GROUP BY子句提高每週查詢性能的任何建議，包括更改表。

我創建了一個物化視圖作為測試，但當然刷新它所花費的時間與原始查詢完全相同，所以除非我每天只刷新幾次，否則它並沒有真正的幫助，代價是添加複雜：

account=&gt; CREATE MATERIALIZED VIEW account_activity_weekly AS SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;
SELECT 52

根據附加評論，我將查詢修改如下，將執行時間縮短了一半，並提供了預期的結果集：

account=&gt; EXPLAIN ANALYZE VERBOSE SELECT to_timestamp(to_char(date_trunc('week', event_time), 'IYYY-IW'), 'IYYY-IW')::date AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date_trunc('week', event_time) ORDER BY date;

Sort  (cost=724523.11..730249.97 rows=2290745 width=12) (actual time=3188.495..3188.496 rows=52 loops=1)
  Output: ((to_timestamp(to_char((date_trunc('week'::text, event_time)), 'IYYY-IW'::text), 'IYYY-IW'::text))::date), (count(DISTINCT account)), (date_trunc('week'::text, event_time))
  Sort Key: ((to_timestamp(to_char((date_trunc('week'::text, account_history.event_time)), 'IYYY-IW'::text), 'IYYY-IW'::text))::date)
  Sort Method: quicksort  Memory: 29kB
  -&gt;  GroupAggregate  (cost=374662.50..443384.85 rows=2290745 width=12) (actual time=2573.694..3188.451 rows=52 loops=1)
        Output: (to_timestamp(to_char((date_trunc('week'::text, event_time)), 'IYYY-IW'::text), 'IYYY-IW'::text))::date, count(DISTINCT account), (date_trunc('week'::text, event_time))
        Group Key: (date_trunc('week'::text, account_history.event_time))
        -&gt;  Sort  (cost=374662.50..380389.36 rows=2290745 width=12) (actual time=2566.086..2859.590 rows=2313889 loops=1)
              Output: (date_trunc('week'::text, event_time)), event_time, account
              Sort Key: (date_trunc('week'::text, account_history.event_time))
              Sort Method: external merge  Disk: 67816kB
              -&gt;  Index Scan using account_history_time_idx on public.account_history  (cost=0.44..93524.23 rows=2290745 width=12) (actual time=0.090..1503.985 rows=2313889 loops=1)
                    Output: date_trunc('week'::text, event_time), event_time, account
                    Index Cond: ((account_history.event_time &gt;= (now() - '357 days'::interval)) AND (account_history.event_time &lt;= now()))
Planning time: 0.205 ms
Execution time: 3198.125 ms
(16 rows)

感謝那些在評論中做出貢獻的人，我通過以下方式將查詢時間從 ~8000 ms 減少到 ~1650 ms：
僅在時間戳列上添加索引（約 2000 毫秒改進）。
刪除額外的時間戳到字元到時間戳的轉換（或添加date_trunc('week', event_time)到GROUP BY子句）（大約 3000 毫秒改進）。
供參考，目前表結構和執行計劃如下。
我確實嘗試了多列索引的其他變體，但是執行計劃沒有使用這些索引。
另外，我聽取了另一條評論的建議，採取了以下步驟（隨後是 VACUUM 和 REINDEX）：
從描述列中刪除約束並將所有空字元串設置為 NULL
將時間戳列從轉換WITHOUT TIME ZONE為WITH TIME ZONE
將 work_mem 增加到 100MB（通過postgresql.conf）。
ALTER TABLE account_history ALTER event_time TYPE timestamptz USING event_time AT TIME ZONE 'UTC';
ALTER TABLE account_history ALTER COLUMN description DROP NOT NULL;
ALTER TABLE account_history ALTER COLUMN description DROP DEFAULT;
UPDATE account_history SET description=NULL WHERE description='';
VACUUM FULL;
REINDEX TABLE account_history;

account=&gt; show work_mem;
work_mem
----------
100MB
這些額外的更改將執行時間又縮短了 400 毫秒，並且還縮短了規劃時間。需要注意的一點是，排序方法已經從“外部排序”變成了“外部合併”。由於仍在使用“磁碟”進行排序，因此我將 work_mem 增加到 200MB，從而使用了快速排序（記憶體）方法（176MB）。這將執行時間縮短了整整一秒（儘管這對於我們的伺服器實例來說實在是太高了）。
更新的表格和執行計劃如下。
account=&gt; \d account_history
                Table "public.account_history"
  Column    |           Type           |       Modifiers
-------------+--------------------------+------------------------
account     | integer                  | not null
event_code  | text                     | not null
event_time  | timestamp with time zone | not null default now()
description | text                     |
Indexes:
   "account_history_account_idx" btree (account)
   "account_history_account_time_idx" btree (event_time DESC, account)
   "account_history_time_idx" btree (event_time DESC)
Foreign-key constraints:
   "account_fk" FOREIGN KEY (account) REFERENCES account(id) ON UPDATE CASCADE ON DELETE RESTRICT
   "event_code_fk" FOREIGN KEY (event_code) REFERENCES domain_account_event(code) ON UPDATE CASCADE ON DELETE RESTRICT
account=&gt; EXPLAIN ANALYZE VERBOSE SELECT date_trunc('week', event_time) AS date, count(DISTINCT account) FROM account_history WHERE event_time BETWEEN now() - interval '51 weeks' AND now() GROUP BY date ORDER BY date;

GroupAggregate  (cost=334034.60..380541.52 rows=2325346 width=12) (actual time=1307.742..1685.676 rows=52 loops=1)
  Output: (date_trunc('week'::text, event_time)), count(DISTINCT account)
  Group Key: (date_trunc('week'::text, account_history.event_time))
  -&gt;  Sort  (cost=334034.60..339847.97 rows=2325346 width=12) (actual time=1303.565..1361.540 rows=2312418 loops=1)
        Output: (date_trunc('week'::text, event_time)), account
        Sort Key: (date_trunc('week'::text, account_history.event_time))
        Sort Method: quicksort  Memory: 176662kB
        -&gt;  Index Only Scan using account_history_account_time_idx on public.account_history  (cost=0.44..88140.73 rows=2325346 width=12) (actual time=0.028..980.822 rows=2312418 loops=1)
              Output: date_trunc('week'::text, event_time), account
              Index Cond: ((account_history.event_time &gt;= (now() - '357 days'::interval)) AND (account_history.event_time &lt;= now()))
              Heap Fetches: 0
Planning time: 0.153 ms
Execution time: 1697.824 ms
到目前為止，我對改進感到非常滿意，但我歡迎任何其他有助於提高此查詢性能的貢獻，因為在我的觀點中，這仍然是最慢的查詢。

引用自：https://dba.stackexchange.com/questions/189190

PostgreSQL 9.3 中按週時間戳範圍內的查詢性能不佳

相關問答

優化具有小 LIMIT 的查詢，以一列為謂詞並按另一列排序

僅從具有 10^7 行的表中檢索更改的行

複雜查詢的 OFFSET 和 LIMIT

優化對一系列時間戳（一列）的查詢

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描