Redshift

我能做些什麼來加快這個在一段時間內聚合的查詢?

  • July 28, 2019

我在 Redshift 中有一個星型模式數據庫。我正在使用order_facts以下相關列在事實表上執行聚合:

total- FLOAT- 總訂單成本

payment_date- INTEGER- 付款日期格式YYYYMMDD

shop_id-INTEGER商店維度的外鍵

我正在使用以下維度表:

日期

id- INTEGER- 日期格式YYYYMMDD

date- DATE- SQL 日期格式欄位

商店

idINTEGER表的主鍵

created- DATETIME- 商店創建時間的時間戳

我的目標是獲取每個商店前 30 天的訂單總量表。

我的查詢是這樣的,應該給出準確的答案。但是,它需要半個多小時才能執行:

SELECT of.shop_id, SUM(of.total) FROM order_facts of
INNER JOIN shops s
ON of.shop_id = s.id
INNER JOIN dates d
ON of.payment_date = d.id
WHERE d.date <= (s.created + INTERVAL '30 days')
GROUP BY of.shop_id

我嘗試像這樣重寫它,但是在執行超過 20 分鐘後查詢仍然沒有完成

SELECT SUM(r.total), r.shop_id
FROM (
   SELECT s.created, of.shop_id, of.total, of.payment_date
   FROM order_facts of
   INNER JOIN shops s
   ON of.shop_id = s.id
) r
INNER JOIN dates d
ON r.payment_date = d.id
WHERE d.date <= (r.created + INTERVAL '30 days')
GROUP BY r.shop_id

我目前不明白為什麼要花這麼長時間才能執行。理解這一點將幫助我知道如何更正查詢。同樣,查看上述查詢的更好版本將幫助我了解效率低下的地方。無論哪種方式對我都非常有幫助。

僅獲取過去 X 天或歷史上每家商店的總訂單量是一個非常快速的查詢。因此,似乎我在加入日期表時做的不是最理想的,但不清楚是什麼。

編輯:輸出EXPLAIN

1
XN HashAggregate (cost=868301988.49..868301991.01 rows=1009 width=12)
2
-> XN Merge Join DS_DIST_NONE (cost=0.00..868286973.07 rows=3003084 width=12)
3
Merge Cond: ("outer".id = "inner".payment_date)
4
Join Filter: ("inner".shop_id = "outer".id)
5
-> XN Nested Loop DS_BCAST_INNER (cost=0.00..1773635510.00 rows=32472000 width=8)
6
Join Filter: (("outer".date)::timestamp without time zone > ("inner".created + '30 days'::interval))
7
-> XN Seq Scan on dates d (cost=0.00..110.00 rows=11000 width=8)
8
-> XN Seq Scan on shops s (cost=0.00..88.56 rows=8856 width=12)
9
-> XN Seq Scan on order_facts "of" (cost=0.00..90092.52 rows=9009252 width=16)
10
----- Nested Loop Join in the query plan - review the join predicates to avoid Cartesian products -----

我做了一個子查詢,按商店建立每日訂單總數,然後在該表上執行聚合。執行時間約為 6 秒。最終查詢如下所示:

SELECT s.shop_name, rrr.first_thirty_day_total FROM (
   SELECT SUM(rr.daily_total) AS first_thirty_day_total, rr.shop_id FROM (
       SELECT d.date, r.daily_total, r.shop_id FROM (
           SELECT SUM(of.total) AS daily_total, of.shop_id, of.payment_date
           FROM order_facts of
           GROUP BY shop_id, payment_date
       ) r
       INNER JOIN dates d
       ON r.payment_date = d.id
   ) rr
   INNER JOIN shops s
   ON s.id = rr.shop_id
   WHERE rr.date <= s.created + INTERVAL '30 days'
   GROUP BY rr.shop_id
) rrr
INNER JOIN shops s
ON rrr.shop_id = s.id

引用自:https://dba.stackexchange.com/questions/243919