Postgresql

為什麼集合返回函式 (SRF) 在 FROM 子句中執行速度較慢?

  • June 25, 2018

**這是一個數據庫內部問題。**我正在使用 PostgreSQL 9.5,我想知道為什麼設置返回函式 (SRF),也稱為表值函式 (TVF),在FROM子句中執行速度較慢,例如當我執行這些命令時,

CREATE TABLE foo AS SELECT * FROM generate_series(1,1e7);
SELECT 10000000
Time: 5573.574 ms

總是比,

CREATE TABLE foo AS SELECT generate_series(1,1e7);
SELECT 10000000
Time: 4622.567 ms

是否可以在這裡制定一般規則,以便我們應該始終FROM在子句之外執行 Set-Returning Functions ?

讓我們從比較執行計劃開始:

tinker=> EXPLAIN ANALYZE SELECT * FROM generate_series(1,1e7);
                                                          QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=32) (actual time=2382.582..4291.136 rows=10000000 loops=1)
Planning time: 0.022 ms
Execution time: 5539.522 ms
(3 rows)

tinker=> EXPLAIN ANALYZE SELECT generate_series(1,1e7);
                                          QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.008..2622.365 rows=10000000 loops=1)
Planning time: 0.045 ms
Execution time: 3858.661 ms
(3 rows)

好的,現在我們知道SELECT * FROM generate_series()使用Function Scan節點執行,而SELECT generate_series()使用節點執行Result。無論是什麼導致這些查詢執行不同,都歸結為這兩個節點之間的差異,我們確切地知道在哪裡尋找。

輸出中的另一件有趣的事情EXPLAIN ANALYZE:注意時間。SELECT generate_series()actual time=0.008..2622.365SELECT * FROM generate_series()而是actual time=2382.582..4291.136Function Scan節點在節點完成返回記錄時**開始返回記錄。Result

PostgreSQL 在計劃之間t=0和計劃中做了什麼?顯然這是關於執行需要多長時間,所以我敢打賭這正是它正在做的事情。答案開始成形:似乎立即返回結果,而似乎將結果具體化然後掃描它們。t=2382``Function Scan``generate_series()``Result``Function Scan

順便說EXPLAIN一句,讓我們檢查一下實現。該Result節點位於 中nodeResult.c,它表示:

* DESCRIPTION
*
*      Result nodes are used in queries where no relations are scanned.

程式碼很簡單。

Function Scan生活在nodeFunctionScan.c,實際上它似乎採取了一個兩階段的執行策略

/*
* If first time through, read all tuples from function and put them
* in a tuplestore. Subsequent calls just fetch tuples from
* tuplestore.
*/

為了清楚起見,讓我們看看atuplestore什麼:

* tuplestore.h
*    Generalized routines for temporary tuple storage.
*
* This module handles temporary storage of tuples for purposes such
* as Materialize nodes, hashjoin batch files, etc.  It is essentially
* a dumbed-down version of tuplesort.c; it does no sorting of tuples
* but can only store and regurgitate a sequence of tuples.  However,
* because no sort is required, it is allowed to start reading the sequence
* before it has all been written.  This is particularly useful for cursors,
* because it allows random access within the already-scanned portion of
* a query without having to process the underlying scan to completion.
* Also, it is possible to support multiple independent read pointers.
*
* A temporary file is used to handle the data if it exceeds the
* space limit specified by the caller.

假設得到證實。Function Scan預先執行,具體化函式的結果,這對於大型結果集會導致溢出到磁碟。Result不實現任何東西,但也只支持瑣碎的操作。

引用自:https://dba.stackexchange.com/questions/201576