為什麼集合返回函式 (SRF) 在 FROM 子句中執行速度較慢？

June 25, 2018

**這是一個數據庫內部問題。**我正在使用 PostgreSQL 9.5，我想知道為什麼設置返回函式 (SRF)，也稱為表值函式 (TVF)，在FROM子句中執行速度較慢，例如當我執行這些命令時，
CREATE TABLE foo AS SELECT * FROM generate_series(1,1e7);
SELECT 10000000
Time: 5573.574 ms
它總是比，
CREATE TABLE foo AS SELECT generate_series(1,1e7);
SELECT 10000000
Time: 4622.567 ms
是否可以在這裡制定一般規則，以便我們應該始終FROM在子句之外執行 Set-Returning Functions ？

讓我們從比較執行計劃開始：
tinker=&gt; EXPLAIN ANALYZE SELECT * FROM generate_series(1,1e7);
                                                          QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=32) (actual time=2382.582..4291.136 rows=10000000 loops=1)
Planning time: 0.022 ms
Execution time: 5539.522 ms
(3 rows)

tinker=&gt; EXPLAIN ANALYZE SELECT generate_series(1,1e7);
                                          QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.008..2622.365 rows=10000000 loops=1)
Planning time: 0.045 ms
Execution time: 3858.661 ms
(3 rows)
好的，現在我們知道SELECT * FROM generate_series()使用Function Scan節點執行，而SELECT generate_series()使用節點執行Result。無論是什麼導致這些查詢執行不同，都歸結為這兩個節點之間的差異，我們確切地知道在哪裡尋找。
輸出中的另一件有趣的事情EXPLAIN ANALYZE：注意時間。SELECT generate_series()是actual time=0.008..2622.365，SELECT * FROM generate_series()而是actual time=2382.582..4291.136。Function Scan節點在節點完成返回記錄時**開始返回記錄。Result
PostgreSQL 在計劃之間t=0和計劃中做了什麼？顯然這是關於執行需要多長時間，所以我敢打賭這正是它正在做的事情。答案開始成形：似乎立即返回結果，而似乎將結果具體化然後掃描它們。t=2382``Function Scan``generate_series()``Result``Function Scan
順便說EXPLAIN一句，讓我們檢查一下實現。該Result節點位於中nodeResult.c，它表示：
* DESCRIPTION
*
*      Result nodes are used in queries where no relations are scanned.
程式碼很簡單。
Function Scan生活在nodeFunctionScan.c，實際上它似乎採取了一個兩階段的執行策略：
/*
* If first time through, read all tuples from function and put them
* in a tuplestore. Subsequent calls just fetch tuples from
* tuplestore.
*/
為了清楚起見，讓我們看看atuplestore是什麼：
* tuplestore.h
*    Generalized routines for temporary tuple storage.
*
* This module handles temporary storage of tuples for purposes such
* as Materialize nodes, hashjoin batch files, etc.  It is essentially
* a dumbed-down version of tuplesort.c; it does no sorting of tuples
* but can only store and regurgitate a sequence of tuples.  However,
* because no sort is required, it is allowed to start reading the sequence
* before it has all been written.  This is particularly useful for cursors,
* because it allows random access within the already-scanned portion of
* a query without having to process the underlying scan to completion.
* Also, it is possible to support multiple independent read pointers.
*
* A temporary file is used to handle the data if it exceeds the
* space limit specified by the caller.
假設得到證實。Function Scan預先執行，具體化函式的結果，這對於大型結果集會導致溢出到磁碟。Result不實現任何東西，但也只支持瑣碎的操作。

引用自：https://dba.stackexchange.com/questions/201576

為什麼集合返回函式 (SRF) 在 FROM 子句中執行速度較慢？

相關問答

強制執行一次穩定的 postgres 函式

Postgres - 如果正則表達式匹配失敗，則返回預設值

聲明函式可變性 IMMUTABLE 會損害性能嗎？

用 plpgsql 編寫的函式呼叫的 Postgres 查詢計劃

在 Postgres 中使用機率過濾器（bloom 過濾器或 cuckoo 過濾器）過濾行

帶有 SELECT 的 SQL 函式與帶有 RETURN QUERY SELECT 的 PLPGSQL 函式？