Query

查詢事件表以進行報告

  • May 31, 2012

背景 (用文字問題掩蓋現實世界的問題)

Farmer Brown 在加利福尼亞州有一個桃園,並在全國各地運送數十萬個桃子。當一批桃子到達目的地時,送貨員會檢查桃子並保留所有未通過檢查的桃子。

當未通過檢查的桃子返回 Farmer Brown 的農場時,桃子缺陷將在 Farmer Brown 的桃子醫院進行修復,並與下一批桃子一起重新裝運。如果原來的桃子再次檢查不合格,只要桃子可以打撈,它們就會經歷同樣的過程。

桃子醫院很貴,運費也很貴,所以 Farmer Brown 開始思考,“對於我在任何時間範圍內運送的桃子,我總共運送了多少?其中,第二次運送了多少?第三次(或更多)?”

在高層次上,peach 發貨跟踪表(在 SQL Server 2000 數據庫上)有超過900 萬條不同類型的發貨記錄,看起來(大致)類似於:

ShipmentDate   PeachID    Shipment Type
8/10/2005         7112      Shipment to Customer
7/15/2007         8798      Shipment to Customer
4/3/2009          8798      Shipment to Customer
4/15/2009         8798      Shipment to Customer
4/21/2009         8798      Shipment to Customer
4/21/2009         145751    Shipment to Customer
4/22/2009         7112      Shipment to Customer
4/22/2009         12121     Shipment to Customer
4/25/2009         8798      Shipment to Customer
5/12/2009         8798      Shipment to Customer

如果我們嘗試返回 Farmer Brown 正在尋找的樣本,結果將是這樣的:

給定以下參數:

StartDate = 4/1/2009, EndDate = 4/30/2009

結果

——最簡單的部分

運送給客戶的桃子總數:100,000

(PeachID 8798 佔總數的 4。PeachID 7112 佔總數的 1。)

– 亂七八糟的部分

Total Peachs Shiped for the 2nd time : 5,000

(PeachID 8798 貢獻 1. PeachID 7112 貢獻 1.)

第三次(或更多)運送的桃子總數:800

(PeachID 8798 佔總數的 2。)

問題

建構此數據的保存表的最有效方法是什麼(每天刷新一次 - 實時並不重要),以便在檢索 Fa​​rmer Brown 請求的數據時,可以快速返回數據?我在想,如果從 Farmer Brown 開業之日開始到現在的出貨量按天匯總到一個持有表中,然後對任何日期範圍參數進行匯總,則 proc 的執行速度將比查詢 Shipments 表及其數百萬條記錄。

抱歉,如果我沒有足夠清楚地解釋情況。我真的很欣賞對此的外部觀點。

我會PriorShipments在事實表中添加一個欄位。您將有很多行,但是在ShipDate欄位上進行分群應該非常有效,並且查詢會非常簡單。

CREATE TABLE #Shipments
(
ID  INT NOT NULL,
ShipDate    DATE NOT NULL,
PRIMARY KEY (ID, ShipDate),
CustomerID  INT NOT NULL,
PriorShipments  INT NOT NULL DEFAULT 0
)
INSERT INTO #Shipments (ShipDate, ID, CustomerID) VALUES
   ('2005-08-10',   7112,  942), 
   ('2007-07-15',   8798,  160), 
   ('2009-04-03',   8798,  160), 
   ('2009-04-15',   8798,  160), 
   ('2009-04-21',   8798,  160), 
   ('2009-04-21', 145751,  139), 
   ('2009-04-22',   7112,  942), 
   ('2009-04-22',  12121, 1015), 
   ('2009-04-25',   8798,  160), 
   ('2009-05-12',   8798,  160)

UPDATE RS
SET PriorShipments = NumPrev
FROM #Shipments AS RS
   INNER JOIN
       (
       SELECT S1.ID, S1.ShipDate, COUNT(*) AS NumPrev
       FROM #Shipments AS S1
           INNER JOIN #Shipments AS S2 ON S1.ID = S2.ID AND S2.ShipDate < S1.ShipDate
       GROUP BY S1.ID, S1.ShipDate
       ) AS Seq ON RS.ID = Seq.ID AND RS.ShipDate = Seq.ShipDate

-- If Farmer Brown gets a Dept of Agriculture grant to upgrade his database, he could instead use:
UPDATE RS
SET PriorShipments = Seq - 1
FROM #Shipments AS RS
   INNER JOIN
       (
       SELECT ID, ShipDate, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ShipDate) AS Seq
       FROM #Shipments
       ) AS Seq ON RS.ID = Seq.ID AND RS.ShipDate = Seq.ShipDate

-- The good part
SELECT COUNT(*) FROM #Shipments WHERE ShipDate BETWEEN '2009-04-01' AND '2009-04-30'
SELECT COUNT(*) FROM #Shipments WHERE ShipDate BETWEEN '2009-04-01' AND '2009-04-30' AND PriorShipments = 1
SELECT COUNT(*) FROM #Shipments WHERE ShipDate BETWEEN '2009-04-01' AND '2009-04-30' AND PriorShipments >= 2

你可以預先匯總一下,只要 Farmer Brown 永遠不會關心他一直在向哪些農場運送桃子。

搬到鄉下,我會吃很多桃子…

引用自:https://dba.stackexchange.com/questions/18579