為什麼集合返回函式 (SRF) 在 FROM 子句中執行速度較慢?
**這是一個數據庫內部問題。**我正在使用 PostgreSQL 9.5,我想知道為什麼設置返回函式 (SRF),也稱為表值函式 (TVF),在
FROM
子句中執行速度較慢,例如當我執行這些命令時,CREATE TABLE foo AS SELECT * FROM generate_series(1,1e7); SELECT 10000000 Time: 5573.574 ms
它總是比,
CREATE TABLE foo AS SELECT generate_series(1,1e7); SELECT 10000000 Time: 4622.567 ms
是否可以在這裡制定一般規則,以便我們應該始終
FROM
在子句之外執行 Set-Returning Functions ?
讓我們從比較執行計劃開始:
tinker=> EXPLAIN ANALYZE SELECT * FROM generate_series(1,1e7); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=32) (actual time=2382.582..4291.136 rows=10000000 loops=1) Planning time: 0.022 ms Execution time: 5539.522 ms (3 rows) tinker=> EXPLAIN ANALYZE SELECT generate_series(1,1e7); QUERY PLAN ------------------------------------------------------------------------------------------------- Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.008..2622.365 rows=10000000 loops=1) Planning time: 0.045 ms Execution time: 3858.661 ms (3 rows)
好的,現在我們知道
SELECT * FROM generate_series()
使用Function Scan
節點執行,而SELECT generate_series()
使用節點執行Result
。無論是什麼導致這些查詢執行不同,都歸結為這兩個節點之間的差異,我們確切地知道在哪裡尋找。輸出中的另一件有趣的事情
EXPLAIN ANALYZE
:注意時間。SELECT generate_series()
是actual time=0.008..2622.365
,SELECT * FROM generate_series()
而是actual time=2382.582..4291.136
。Function Scan
節點在節點完成返回記錄時**開始返回記錄。Result
PostgreSQL 在計劃之間
t=0
和計劃中做了什麼?顯然這是關於執行需要多長時間,所以我敢打賭這正是它正在做的事情。答案開始成形:似乎立即返回結果,而似乎將結果具體化然後掃描它們。t=2382``Function Scan``generate_series()``Result``Function Scan
順便說
EXPLAIN
一句,讓我們檢查一下實現。該Result
節點位於 中nodeResult.c
,它表示:* DESCRIPTION * * Result nodes are used in queries where no relations are scanned.
程式碼很簡單。
Function Scan
生活在nodeFunctionScan.c
,實際上它似乎採取了一個兩階段的執行策略:/* * If first time through, read all tuples from function and put them * in a tuplestore. Subsequent calls just fetch tuples from * tuplestore. */
為了清楚起見,讓我們看看a
tuplestore
是什麼:* tuplestore.h * Generalized routines for temporary tuple storage. * * This module handles temporary storage of tuples for purposes such * as Materialize nodes, hashjoin batch files, etc. It is essentially * a dumbed-down version of tuplesort.c; it does no sorting of tuples * but can only store and regurgitate a sequence of tuples. However, * because no sort is required, it is allowed to start reading the sequence * before it has all been written. This is particularly useful for cursors, * because it allows random access within the already-scanned portion of * a query without having to process the underlying scan to completion. * Also, it is possible to support multiple independent read pointers. * * A temporary file is used to handle the data if it exceeds the * space limit specified by the caller.
假設得到證實。
Function Scan
預先執行,具體化函式的結果,這對於大型結果集會導致溢出到磁碟。Result
不實現任何東西,但也只支持瑣碎的操作。