Postgresql

為什麼 BYTEA 上的 ENCODE 似乎沒有任何性能影響?

  • November 2, 2018

我正在研究一種以hex固定長度 64 格式將字元串儲存到數據庫的方案。顯然選擇是BYTEACHAR(64)

最初的想法是強制使用有效的十六進製字元串來儲存它,BYTEA這是個好主意,但我評估的影響是ENCODE在選擇查詢上的使用。

考慮到兩個表都有幾百萬行,我做了一些性能基準測試;

# A file query_with_char.sql
SELECT "key" FROM table_varchar;;

# A file query_with_binary.sql
SELECT ENCODE("key", 'hex') FROM table_binary;

pgbench -c 30 -T 120 -n -f ./query_with_binary.sql -f ./query_with_char.sql -P 5 -S my_db

SQL script 1: ./query_with_binary.sql
- weight: 1 (targets 33.3% of total)
- 236 transactions (34.6% of total, tps = 1.876072)
- latency average = 8896.888 ms
- latency stddev = 2548.701 ms
SQL script 2: ./query_with_varchar.sql
- weight: 1 (targets 33.3% of total)
- 225 transactions (33.0% of total, tps = 1.788628)
- latency average = 7164.604 ms
- latency stddev = 2209.866 ms

我無法理解為什麼查詢的性能ENCODE比普通字元串更快。PostgreSQL 如何能夠比僅獲取字元串列更快地對數百萬行進行編碼?

有人可以解釋上述測試中可能出現的問題嗎?

encode()是一個非常便宜的功能。我不希望在您的測試中產生任何可衡量的影響。

幾乎可以肯定,這種差異由於byteachar(64). 考慮:

SELECT pg_column_size('90b7525e84f64850c2efb407fae3f27190b7525e84f64850c2efb407fae3f271'::char(64)) AS size_char64
    , pg_column_size(decode(text '90b7525e84f64850c2efb407fae3f27190b7525e84f64850c2efb407fae3f271', 'hex')) AS size_bytea;

size_char64 | size_bytea
-------------+------------
         68 |         36

SELECT簡單查詢性能的主要因素是已讀取的數據頁數

“最佳”數據類型?

顯然選擇是BYTEACHAR(64)

如果您的目標是優化性能,請考慮第三種選擇:

2uuid

要理解,請先閱讀:

和:

然後考慮這個展示(在第 11 頁執行,但適用於所有現代版本):

記憶體大小:

SELECT pg_column_size(t64)                   AS c_text
    , pg_column_size(t64::char(64))         AS c_char64
    , pg_column_size(decode(t64, 'hex'))    AS c_bytea
    , pg_column_size( left(t64, 32)::uuid)
    + pg_column_size(right(t64, 32)::uuid)  AS c_2x_uuid
FROM  (SELECT text '90b7525e84f64850c2efb407fae3f27190b7525e84f64850c2efb407fae3f271') t(t64);

c_text | c_char64 | c_bytea | c_2x_uuid 
--------+----------+---------+-----------
    68 |       68 |      36 |        32

磁碟大小(壓縮格式):

CREATE TEMP TABLE c64 AS 
SELECT t64                    AS c_text
    , t64::char(64)          AS c_char64
    , decode(t64, 'hex')     AS c_bytea
    , left (t64, 32)::uuid   AS c_uuid1
    , right(t64, 32)::uuid   AS c_uuid2
FROM  (SELECT text '90b7525e84f64850c2efb407fae3f27190b7525e84f64850c2efb407fae3f271') t(t64);

SELECT pg_column_size(c_text)    AS c_text
    , pg_column_size(c_char64)  AS c_char64
    , pg_column_size(c_bytea)   AS c_bytea
    , pg_column_size(c_uuid1)
    + pg_column_size(c_uuid2)   AS c_2x_uuid
FROM   c64;

c_text | c_char64 | c_bytea | c_2x_uuid 
--------+----------+---------+-----------
    65 |       65 |      33 |        32

db<>在這裡擺弄

33 位和 32 位之間看似微小的差異實際上可以產生 8 個字節的差異,因為一些儲存機制需要填充 8 個字節的倍數。

使用 2 個 UUID 重複您的測試。我有信心它會名列前茅。

引用自:https://dba.stackexchange.com/questions/221630