Postgresql
為什麼 BYTEA 上的 ENCODE 似乎沒有任何性能影響?
我正在研究一種以
hex
固定長度 64 格式將字元串儲存到數據庫的方案。顯然選擇是BYTEA
和CHAR(64)
。最初的想法是強制使用有效的十六進製字元串來儲存它,
BYTEA
這是個好主意,但我評估的影響是ENCODE
在選擇查詢上的使用。考慮到兩個表都有幾百萬行,我做了一些性能基準測試;
# A file query_with_char.sql SELECT "key" FROM table_varchar;; # A file query_with_binary.sql SELECT ENCODE("key", 'hex') FROM table_binary; pgbench -c 30 -T 120 -n -f ./query_with_binary.sql -f ./query_with_char.sql -P 5 -S my_db SQL script 1: ./query_with_binary.sql - weight: 1 (targets 33.3% of total) - 236 transactions (34.6% of total, tps = 1.876072) - latency average = 8896.888 ms - latency stddev = 2548.701 ms SQL script 2: ./query_with_varchar.sql - weight: 1 (targets 33.3% of total) - 225 transactions (33.0% of total, tps = 1.788628) - latency average = 7164.604 ms - latency stddev = 2209.866 ms
我無法理解為什麼查詢的性能
ENCODE
比普通字元串更快。PostgreSQL 如何能夠比僅獲取字元串列更快地對數百萬行進行編碼?有人可以解釋上述測試中可能出現的問題嗎?
encode()
是一個非常便宜的功能。我不希望在您的測試中產生任何可衡量的影響。幾乎可以肯定,這種差異是由於
bytea
與char(64)
. 考慮:SELECT pg_column_size('90b7525e84f64850c2efb407fae3f27190b7525e84f64850c2efb407fae3f271'::char(64)) AS size_char64 , pg_column_size(decode(text '90b7525e84f64850c2efb407fae3f27190b7525e84f64850c2efb407fae3f271', 'hex')) AS size_bytea; size_char64 | size_bytea -------------+------------ 68 | 36
SELECT
簡單查詢性能的主要因素是已讀取的數據頁數。“最佳”數據類型?
顯然選擇是
BYTEA
和CHAR(64)
。如果您的目標是優化性能,請考慮第三種選擇:
2
uuid
列要理解,請先閱讀:
和:
然後考慮這個展示(在第 11 頁執行,但適用於所有現代版本):
記憶體大小:
SELECT pg_column_size(t64) AS c_text , pg_column_size(t64::char(64)) AS c_char64 , pg_column_size(decode(t64, 'hex')) AS c_bytea , pg_column_size( left(t64, 32)::uuid) + pg_column_size(right(t64, 32)::uuid) AS c_2x_uuid FROM (SELECT text '90b7525e84f64850c2efb407fae3f27190b7525e84f64850c2efb407fae3f271') t(t64); c_text | c_char64 | c_bytea | c_2x_uuid --------+----------+---------+----------- 68 | 68 | 36 | 32
磁碟大小(壓縮格式):
CREATE TEMP TABLE c64 AS SELECT t64 AS c_text , t64::char(64) AS c_char64 , decode(t64, 'hex') AS c_bytea , left (t64, 32)::uuid AS c_uuid1 , right(t64, 32)::uuid AS c_uuid2 FROM (SELECT text '90b7525e84f64850c2efb407fae3f27190b7525e84f64850c2efb407fae3f271') t(t64); SELECT pg_column_size(c_text) AS c_text , pg_column_size(c_char64) AS c_char64 , pg_column_size(c_bytea) AS c_bytea , pg_column_size(c_uuid1) + pg_column_size(c_uuid2) AS c_2x_uuid FROM c64; c_text | c_char64 | c_bytea | c_2x_uuid --------+----------+---------+----------- 65 | 65 | 33 | 32
db<>在這裡擺弄
33 位和 32 位之間看似微小的差異實際上可以產生 8 個字節的差異,因為一些儲存機制需要填充 8 個字節的倍數。
使用 2 個 UUID 重複您的測試。我有信心它會名列前茅。