查詢事件表以進行報告
背景 (用文字問題掩蓋現實世界的問題)
Farmer Brown 在加利福尼亞州有一個桃園,並在全國各地運送數十萬個桃子。當一批桃子到達目的地時,送貨員會檢查桃子並保留所有未通過檢查的桃子。
當未通過檢查的桃子返回 Farmer Brown 的農場時,桃子缺陷將在 Farmer Brown 的桃子醫院進行修復,並與下一批桃子一起重新裝運。如果原來的桃子再次檢查不合格,只要桃子可以打撈,它們就會經歷同樣的過程。
桃子醫院很貴,運費也很貴,所以 Farmer Brown 開始思考,“對於我在任何時間範圍內運送的桃子,我總共運送了多少?其中,第二次運送了多少?第三次(或更多)?”
在高層次上,peach 發貨跟踪表(在 SQL Server 2000 數據庫上)有超過900 萬條不同類型的發貨記錄,看起來(大致)類似於:
ShipmentDate PeachID Shipment Type 8/10/2005 7112 Shipment to Customer 7/15/2007 8798 Shipment to Customer 4/3/2009 8798 Shipment to Customer 4/15/2009 8798 Shipment to Customer 4/21/2009 8798 Shipment to Customer 4/21/2009 145751 Shipment to Customer 4/22/2009 7112 Shipment to Customer 4/22/2009 12121 Shipment to Customer 4/25/2009 8798 Shipment to Customer 5/12/2009 8798 Shipment to Customer
如果我們嘗試返回 Farmer Brown 正在尋找的樣本,結果將是這樣的:
給定以下參數:
StartDate = 4/1/2009, EndDate = 4/30/2009
結果
——最簡單的部分
運送給客戶的桃子總數:100,000
(PeachID 8798 佔總數的 4。PeachID 7112 佔總數的 1。)
– 亂七八糟的部分
Total Peachs Shiped for the 2nd time : 5,000
(PeachID 8798 貢獻 1. PeachID 7112 貢獻 1.)
第三次(或更多)運送的桃子總數:800
(PeachID 8798 佔總數的 2。)
問題
建構此數據的保存表的最有效方法是什麼(每天刷新一次 - 實時並不重要),以便在檢索 Farmer Brown 請求的數據時,可以快速返回數據?我在想,如果從 Farmer Brown 開業之日開始到現在的出貨量按天匯總到一個持有表中,然後對任何日期範圍參數進行匯總,則 proc 的執行速度將比查詢 Shipments 表及其數百萬條記錄。
抱歉,如果我沒有足夠清楚地解釋情況。我真的很欣賞對此的外部觀點。
我會
PriorShipments
在事實表中添加一個欄位。您將有很多行,但是在ShipDate
欄位上進行分群應該非常有效,並且查詢會非常簡單。CREATE TABLE #Shipments ( ID INT NOT NULL, ShipDate DATE NOT NULL, PRIMARY KEY (ID, ShipDate), CustomerID INT NOT NULL, PriorShipments INT NOT NULL DEFAULT 0 ) INSERT INTO #Shipments (ShipDate, ID, CustomerID) VALUES ('2005-08-10', 7112, 942), ('2007-07-15', 8798, 160), ('2009-04-03', 8798, 160), ('2009-04-15', 8798, 160), ('2009-04-21', 8798, 160), ('2009-04-21', 145751, 139), ('2009-04-22', 7112, 942), ('2009-04-22', 12121, 1015), ('2009-04-25', 8798, 160), ('2009-05-12', 8798, 160) UPDATE RS SET PriorShipments = NumPrev FROM #Shipments AS RS INNER JOIN ( SELECT S1.ID, S1.ShipDate, COUNT(*) AS NumPrev FROM #Shipments AS S1 INNER JOIN #Shipments AS S2 ON S1.ID = S2.ID AND S2.ShipDate < S1.ShipDate GROUP BY S1.ID, S1.ShipDate ) AS Seq ON RS.ID = Seq.ID AND RS.ShipDate = Seq.ShipDate -- If Farmer Brown gets a Dept of Agriculture grant to upgrade his database, he could instead use: UPDATE RS SET PriorShipments = Seq - 1 FROM #Shipments AS RS INNER JOIN ( SELECT ID, ShipDate, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ShipDate) AS Seq FROM #Shipments ) AS Seq ON RS.ID = Seq.ID AND RS.ShipDate = Seq.ShipDate -- The good part SELECT COUNT(*) FROM #Shipments WHERE ShipDate BETWEEN '2009-04-01' AND '2009-04-30' SELECT COUNT(*) FROM #Shipments WHERE ShipDate BETWEEN '2009-04-01' AND '2009-04-30' AND PriorShipments = 1 SELECT COUNT(*) FROM #Shipments WHERE ShipDate BETWEEN '2009-04-01' AND '2009-04-30' AND PriorShipments >= 2
你可以預先匯總一下,只要 Farmer Brown 永遠不會關心他一直在向哪些農場運送桃子。
搬到鄉下,我會吃很多桃子…