看似簡單的查詢,但執行時間出人意料
我有以下架構:
並希望使用以下查詢獲得每個test_acts.code的最後****價格(基於transaction_datetime ):
SELECT code, price FROM test_transaction_acts, test_transactions, test_invoice_items, test_acts test_acts_main WHERE test_transaction_acts.test_transaction_id= test_transactions.test_transaction_id AND test_invoice_items.test_invoice_item_id= test_transaction_acts.test_invoice_item_id AND test_acts_main.test_act_id= test_invoice_items.test_act_id AND test_transaction_acts.price IS NOT NULL AND test_transactions.transaction_datetime = ( SELECT MAX(transaction_datetime) FROM test_transaction_acts, test_transactions, test_invoice_items, test_acts WHERE test_transaction_acts.test_transaction_id= test_transactions.test_transaction_id AND test_invoice_items.test_invoice_item_id = test_transaction_acts.test_invoice_item_id AND test_invoice_items.test_act_id = test_acts.test_act_id AND test_acts.code = test_acts_main.code AND test_transaction_acts.price IS NOT NULL) order by code
這些表格包含以下資訊量:
- test_transaction_acts = 2240823 行
- test_transactions = 832746 行
- test_invoice_items = 2421058 行
- test_acts = 24 行
結果如下:
**我的問題是:**使用索引,我無法在 14 秒內獲得查詢,我無法確定這是否正常,我只是應該更聰明地儲存資訊,以便更容易訪問我的目的,或者我是否應該繼續探勘以使我的查詢更快。
我知道一些“test_acts”僅用於舊的“test_transactions”,這意味著數據庫必須經過很多“test_transactions”才能真正使用所需的“程式碼”,這樣才能解釋查詢的長度。
我改寫了我的查詢以使用不同的命名法來嘗試查看查詢本身是否是問題,我查看了解釋分析以確保正確使用索引,嘗試盡可能多地理解解釋並測試各種多列嘗試查看結果差異的索引:
Sort (cost=2673720.31..2673748.25 rows=11177 width=33) (actual time=16972.155..16972.156 rows=24 loops=1) Sort Key: test_acts_main.code Sort Method: quicksort Memory: 26kB -> Hash Join (cost=184630.73..2672968.75 rows=11177 width=33) (actual time=16677.629..16972.074 rows=24 loops=1) Hash Cond: (((test_transaction_acts.test_invoice_item_id)::bigint = test_invoice_items.test_invoice_item_id) AND (test_acts_main.test_act_id = (test_invoice_items.test_act_id)::bigint)) -> Nested Loop (cost=99195.28..2566538.13 rows=279421 width=45) (actual time=1828.289..15974.502 rows=400 loops=1) -> Nested Loop (cost=99194.85..2501879.00 rows=104093 width=36) (actual time=1828.266..15973.431 rows=145 loops=1) -> Index Scan using test_acts_test_act_id_idx on test_acts test_acts_main (cost=0.14..12.51 rows=25 width=32) (actual time=0.017..0.040 rows=25 loops=1) -> Index Scan using test_transactions_transaction_datetime_idx on test_transactions (cost=99194.71..100033.02 rows=4164 width=12) (actual time=0.030..0.034 rows=6 loops=25) Index Cond: (transaction_datetime = (SubPlan 1)) SubPlan 1 -> Aggregate (cost=99194.27..99194.28 rows=1 width=8) (actual time=638.893..638.893 rows=1 loops=25) -> Hash Join (cost=29468.13..98970.74 rows=89415 width=8) (actual time=252.679..633.311 rows=89420 loops=25) Hash Cond: ((test_transaction_acts_1.test_transaction_id)::bigint = test_transactions_1.test_transaction_id) -> Nested Loop (cost=2162.34..65769.09 rows=89415 width=8) (actual time=5.944..248.034 rows=89420 loops=25) -> Nested Loop (cost=2161.91..17844.20 rows=96842 width=4) (actual time=4.661..57.592 rows=96842 loops=25) -> Seq Scan on test_acts (cost=0.00..1.31 rows=1 width=4) (actual time=0.004..0.006 rows=1 loops=25) Filter: ((code)::text = (test_acts_main.code)::text) Rows Removed by Filter: 24 -> Bitmap Heap Scan on test_invoice_items test_invoice_items_1 (cost=2161.91..16690.01 rows=115288 width=12) (actual time=4.635..48.117 rows=96842 loops=25) Recheck Cond: ((test_act_id)::bigint = test_acts.test_act_id) Heap Blocks: exact=130166 -> Bitmap Index Scan on test_invoice_items_test_act_id_idx (cost=0.00..2133.09 rows=115288 width=0) (actual time=4.097..4.097 rows=96842 loops=25) Index Cond: ((test_act_id)::bigint = test_acts.test_act_id) -> Index Scan using test_transaction_acts_test_invoice_item_id_idx on test_transaction_acts test_transaction_acts_1 (cost=0.43..0.48 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=2421058) Index Cond: ((test_invoice_item_id)::bigint = test_invoice_items_1.test_invoice_item_id) Filter: (price IS NOT NULL) Rows Removed by Filter: 0 -> Hash (cost=12829.46..12829.46 rows=832746 width=12) (actual time=241.633..241.633 rows=832746 loops=25) Buckets: 131072 Batches: 16 Memory Usage: 3262kB -> Seq Scan on test_transactions test_transactions_1 (cost=0.00..12829.46 rows=832746 width=12) (actual time=0.012..99.733 rows=832746 loops=25) -> Index Scan using test_transaction_acts_test_transaction_id_idx on test_transaction_acts (cost=0.43..0.59 rows=3 width=21) (actual time=0.005..0.006 rows=3 loops=145) Index Cond: ((test_transaction_id)::bigint = test_transactions.test_transaction_id) Filter: (price IS NOT NULL) -> Hash (cost=37297.58..37297.58 rows=2421058 width=12) (actual time=696.049..696.049 rows=2421058 loops=1) Buckets: 131072 Batches: 64 Memory Usage: 2795kB -> Seq Scan on test_invoice_items (cost=0.00..37297.58 rows=2421058 width=12) (actual time=0.026..200.291 rows=2421058 loops=1) Planning time: 2.175 ms Execution time: 16972.606 ms
如果我不能讓查詢執行得更快,我的結論是每次為每個程式碼創建一個新事務時儲存最後一個****價格,有效地在數據庫中創建資訊的副本,以便我可以隨時獲得資訊我需要它。
在進行這樣的“解決方法”機制之前,我需要確保沒有更好的方法來進行這種查詢,或者我可以完全理解解釋/分析以查看是否有更快的速度可供我使用詢問。
我也明白我可以“簡化”數據庫設計,例如,將 test_acts.code 包含在 test_invoice_items 或 test_transaction_acts 中,但我試圖盡可能地將概念分開以表示真實的業務邏輯概念並避免資訊重複。
您將採取哪些步驟來嘗試更快地獲取此查詢?您是否願意在數據庫中複製資訊,只是為了擁有一個可以解決性能問題的格式的副本?擁有不同格式的數據副本是否被認為是不好的做法?我希望您希望盡可能避免這樣做。
如果有幫助,這裡是我最新範例中使用的索引,之前發布的分析說明使用了這些索引:
CREATE INDEX test_transaction_acts_test_invoice_item_id_idx ON test_transaction_acts (test_invoice_item_id); CREATE INDEX test_transaction_acts_test_transaction_act_id_idx ON test_transaction_acts (test_transaction_act_id); CREATE INDEX test_transaction_acts_test_transaction_id_idx ON test_transaction_acts (test_transaction_id); CREATE INDEX test_transactions_test_transaction_id_idx ON test_transactions (test_transaction_id); CREATE INDEX test_transactions_transaction_datetime_idx ON test_transactions (transaction_datetime); CREATE INDEX test_invoice_items_test_act_id_idx ON test_invoice_items (test_act_id); CREATE INDEX test_invoice_items_test_invoice_item_id_idx ON test_invoice_items (test_invoice_item_id); CREATE INDEX test_acts_test_act_id_idx ON test_acts (test_act_id);
這是我關於堆棧交換的第一個問題,我嘗試遵循所有準則,如果我搞砸了,請告訴我!
您的時間花在子查詢的多次重複上。嘗試使用以下方法更聰明地編寫
DISTINCT ON
:SELECT DISTINCT ON (ta.code) ta.code, tta.price FROM test_transaction_acts AS tta JOIN test_transactions AS tt USING (test_transaction_id) JOIN test_invoice_items AS tii USING (test_invoice_item_id) JOIN test_acts AS ta USING (test_act_id) WHERE tta.price IS NOT NULL ORDER BY ta.code, tt.transaction_datetime DESC;
一些風格提示:
- 使用標準
JOIN
語法以提高可讀性並避免忘記連接條件- 為了簡潔和可讀性,使用表別名
- 如果查詢包含多個表,則在列引用前加上表名