WITH-clause 沒有預先計算,線性資源浪費,為什麼?
我正在嘗試節省資源並縮短程式碼,但使用“WITH-clause”會浪費線性資源。
這是我的基本範例:
#!/bin/bash echo "CREATE TABLE 'clients_table' ( id integer primary key, \ name text DEFAULT 'noname', \ param1 int DEFAULT 0, \ param2 int DEFAULT 0, \ param3 int DEFAULT 0, \ param4 int DEFAULT 0, \ param5 int DEFAULT 0 \ );" # dimension=100 # dimension=20 dimension=15 # 1m16s # dimension=10 # 16 seconds. # dimension=2 for i1 in `seq 1 $dimension`;do echo "$(date): i1: $i1" > /dev/stderr for i2 in `seq 1 $dimension`;do for i3 in `seq 1 $dimension`;do for i4 in `seq 1 $dimension`;do for i5 in `seq 1 $dimension`;do echo "INSERT INTO clients_table ( name, param1, param2, param3, param4, param5 ) VALUES ( \ 'c${i1}_${i2}_${i3}_${i4}_${i5}', $i1, $i2, $i3, $i4, $i5 );" done done done done done
在 1 分 16 秒內,維度為 15,我通過以下命令獲取 sqlite 數據庫:
# time bash gen_base.sh | sqlite3 ram_fs/tmp.db ... real 1m16.769s user 1m23.690s sys 0m53.379s # du -hs ram_fs/tmp.db 23M ram_fs/tmp.db
此後,我將這樣的 sql 用於不同的客戶端,通過每個參數,一點點來自這個,一點點來自那個……
WITH work_set AS ( SELECT * FROM clients_table WHERE param1 % 2 == 0 ORDER BY param1 ASC, param2 DESC, param3 ASC, param4 DESC, param5 ASC ), odd2 AS ( SELECT * FROM work_set WHERE param2 % 2 == 1 ), even2 AS ( SELECT * FROM work_set WHERE param2 % 2 == 0 ), /* --------------------------- */ odd2odd3 AS ( SELECT * FROM odd2 WHERE param3 % 2 == 1 ), odd2even3 AS ( SELECT * FROM odd2 WHERE param3 % 2 == 0 ), even2odd3 AS ( SELECT * FROM even2 WHERE param3 % 2 == 1 ), even2even3 AS ( SELECT * FROM even2 WHERE param3 % 2 == 0 ), /*-----------------------------*/ odd2odd3odd4 AS ( SELECT * FROM odd2odd3 WHERE param4 % 2 == 1 ), odd2odd3even4 AS ( SELECT * FROM odd2odd3 WHERE param4 % 2 == 0 ), odd2even3odd4 AS ( SELECT * FROM odd2even3 WHERE param4 % 2 == 1 ), odd2even3even4 AS ( SELECT * FROM odd2even3 WHERE param4 % 2 == 0 ), /*--*/ even2odd3odd4 AS ( SELECT * FROM even2odd3 WHERE param4 % 2 == 1 ), even2odd3even4 AS ( SELECT * FROM even2odd3 WHERE param4 % 2 == 0 ), even2even3odd4 AS ( SELECT * FROM even2even3 WHERE param4 % 2 == 1 ), even2even3even4 AS ( SELECT * FROM even2even3 WHERE param4 % 2 == 0 ), /* --------------------------- */ c1 AS ( SELECT * FROM odd2odd3odd4 WHERE param5 % 2 == 1 LIMIT 1 ), c2 AS ( SELECT * FROM odd2odd3odd4 WHERE param5 % 2 == 0 LIMIT 1 ), /*--*/ c3 AS ( SELECT * FROM odd2odd3even4 WHERE param5 % 2 == 1 LIMIT 1 ), c4 AS ( SELECT * FROM odd2odd3even4 WHERE param5 % 2 == 0 LIMIT 1 ), /*--*/ c5 AS ( SELECT * FROM odd2even3odd4 WHERE param5 % 2 == 1 LIMIT 1 ), c6 AS ( SELECT * FROM odd2even3odd4 WHERE param5 % 2 == 0 LIMIT 1 ), /*--*/ c7 AS ( SELECT * FROM odd2even3even4 WHERE param5 % 2 == 1 LIMIT 1 ), c8 AS ( SELECT * FROM odd2even3even4 WHERE param5 % 2 == 0 LIMIT 1 ), /*--*/ c9 AS ( SELECT * FROM even2odd3odd4 WHERE param5 % 2 == 1 LIMIT 1 ), c10 AS ( SELECT * FROM even2odd3odd4 WHERE param5 % 2 == 0 LIMIT 1 ), /*--*/ c11 AS ( SELECT * FROM even2odd3even4 WHERE param5 % 2 == 1 LIMIT 1 ), c12 AS ( SELECT * FROM even2odd3even4 WHERE param5 % 2 == 0 LIMIT 1 ), /*--*/ c13 AS ( SELECT * FROM even2even3odd4 WHERE param5 % 2 == 1 LIMIT 1 ), c14 AS ( SELECT * FROM even2even3odd4 WHERE param5 % 2 == 0 LIMIT 1 ), /*--*/ c15 AS ( SELECT * FROM even2even3even4 WHERE param5 % 2 == 1 LIMIT 1 ), c16 AS ( SELECT * FROM even2even3even4 WHERE param5 % 2 == 0 LIMIT 1 ) /* --------------------------- */ SELECT * FROM c1 UNION ALL SELECT * FROM c2 UNION ALL SELECT * FROM c3 UNION ALL SELECT * FROM c4 UNION ALL SELECT * FROM c5 UNION ALL SELECT * FROM c6 UNION ALL SELECT * FROM c7 UNION ALL SELECT * FROM c8 UNION ALL SELECT * FROM c9 /*UNION ALL SELECT * FROM c10 UNION ALL SELECT * FROM c11 UNION ALL SELECT * FROM c12 UNION ALL SELECT * FROM c13 UNION ALL SELECT * FROM c14 UNION ALL SELECT * FROM c15 UNION ALL SELECT * FROM c16 */
我正在嘗試
/*
在 sql 末尾移動註釋的開頭,以區分 UNION 的數量並測量時間:# time cat with_select_bug.sql | sqlite3 ram_fs/tmp.db
- 0m0.092s - 只有一個客戶端。
- 0m0.183s - 兩個客戶端。
- 0m0.274s - 三個客戶端。
- …
- 0m0.678s - 八個客戶端。
- …
- 0m1.336s - 所有十六個客戶端。
正如您所看到的,CPU 時間使用率線性增長,一個客戶端 - 比兩個客戶端快兩倍,所有客戶端 - 想要從我的 CPU 中佔用整整一秒。為什麼?
我已經聲明了
WITH-clause
,對吧?-SELECT
應預先計算,所有客戶端應僅過濾先前計算數據的結果。此外,如果我多次詢問同一個客戶:
... SELECT * FROM c16 UNION ALL SELECT * FROM c16 UNION ALL SELECT * FROM c16 UNION ALL SELECT * FROM c16 UNION ALL ...
每次選擇都會給我同一個客戶,但是每個人都會分開計算!!像
c16
- 的聲明一樣不存在……我可以放 100 行
SELECT * FROM c16 UNION ALL
,它會比 1 慢 100 倍。為什麼這樣?問題:
- 為什麼增長是線性的?
- 我怎樣才能提高速度?
- **是“WITH-clause” - 只是為了縮短程式碼?**像宏一樣,沒有加速功能……
您所指的“WITH 子句”是CTE。它允許您多次重用程式碼但只編寫一次。
您的程式碼有一個我從未見過表現良好的模式 - 多個嵌套的 CTE。您編寫的每個 CTE 都會執行多次 - 每次呼叫它時執行一次。如果您發布您的執行計劃,它應該確認這一點。
這篇文章比我更雄辯地解釋了,多次呼叫同一個 CTE 將如何導致優化器在幕後多次執行 CTE。
是“WITH-clause” - 只是為了縮短程式碼?像宏一樣,沒有加速功能……
我會說 CTE 是一種可讀性工具,如果你願意的話,它是一種語法糖。它不僅沒有任何“加速功能”,而且實際上具有與上述文章相反的效果。
我怎樣才能提高速度?
很難從程式碼片段中判斷您的程式碼試圖實現什麼,但可能值得研究
CROSS APPLY
文章中的解決方案,或者考慮使用只會執行一次相關讀取的臨時表,而不是 CTE。您可以通過在程式碼開頭添加來驗證您所做的更改是否使事情變得更好
SET STATISTICS IO, TIME ON
- 如果在您進行程式碼更改後您的讀取和 CPU 時間減少,您的程式碼性能更高。