Index
PostgreSQL 9.4 分析,普通列、索引列和jsonb key的性能
如果我有一個包含以下內容的表格:
CREATE TABLE test( id SERIAL PRIMARY KEY, name VARCHAR(200), age INT, data JSONB );
和
data
列填充{"name": xxx, "age": yyy}
,有時{"name": xxx, "age": yyy, "somethingElse": zzz}
哪個會更快?查詢
test.data->>name
或test.name
?和test.data->>age
或test.age
?任何人都可以為我分析以下這些案例和案例嗎?
INSERT INTO test(name,age,data) SELECT z , gs.y , ('{"name":"' || z || '","age":' || gs.y || '}')::JSONB FROM (SELECT md5(z::text) z, random() AS y FROM generate_series(1,1000000) AS gs(z)) AS gs; EXPLAIN SELECT * FROM test ORDER BY random() LIMIT 30; QUERY PLAN --------------------------------------------------------------------------- Limit (cost=30412.00..30412.07 rows=30 width=458) -> Sort (cost=30412.00..31139.32 rows=290928 width=458) Sort Key: (random()) -> Seq Scan on test (cost=0.00..21819.60 rows=290928 width=458)
整數檢驗
EXPLAIN SELECT * FROM test WHERE age < 0.5 ORDER BY random() LIMIT 30; QUERY PLAN --------------------------------------------------------------------------- Limit (cost=43861.14..43861.22 rows=30 width=116) -> Sort (cost=43861.14..44694.47 rows=333333 width=116) Sort Key: (random()) -> Seq Scan on test (cost=0.00..34016.33 rows=333333 width=116) Filter: ((age)::numeric < 0.5) EXPLAIN SELECT * FROM test WHERE (data->>'age')::FLOAT < 0.5 ORDER BY random() LIMIT 30; QUERY PLAN -------------------------------------------------------------------------------------------- Limit (cost=48861.14..48861.22 rows=30 width=116) -> Sort (cost=48861.14..49694.47 rows=333333 width=116) Sort Key: (random()) -> Seq Scan on test (cost=0.00..39016.33 rows=333333 width=116) Filter: (((data ->> 'age'::text))::double precision < 0.5::double precision) EXPLAIN SELECT * FROM test WHERE (data->>'age')::NUMERIC < 0.5 ORDER BY random() LIMIT 30; QUERY PLAN --------------------------------------------------------------------------- Limit (cost=48861.14..48861.22 rows=30 width=116) -> Sort (cost=48861.14..49694.47 rows=333333 width=116) Sort Key: (random()) -> Seq Scan on test (cost=0.00..39016.33 rows=333333 width=116) Filter: (((data ->> 'age'::text))::numeric < 0.5)
字元串測試
EXPLAIN SELECT * FROM test WHERE name LIKE '%aaa%' ORDER BY random() LIMIT 30; QUERY PLAN -------------------------------------------------------------------------- Limit (cost=31006.58..31006.66 rows=30 width=116) -> Sort (cost=31006.58..31031.83 rows=10101 width=116) Sort Key: (random()) -> Seq Scan on test (cost=0.00..30708.25 rows=10101 width=116) Filter: ((name)::text ~~ '%aaa%'::text) EXPLAIN SELECT * FROM test WHERE data->>name LIKE '%aaa%' ORDER BY random() LIMIT 30; QUERY PLAN -------------------------------------------------------------------------- Limit (cost=34464.38..34464.45 rows=30 width=116) -> Sort (cost=34464.38..34564.38 rows=40000 width=116) Sort Key: (random()) -> Seq Scan on test (cost=0.00..33283.00 rows=40000 width=116) Filter: ((data ->> (name)::text) ~~ '%aaa%'::text)
創建索引後
CREATE INDEX test_name ON test(name); CREATE INDEX test_age on test(age); EXPLAIN SELECT * FROM test ORDER BY random() LIMIT 30; QUERY PLAN ---------------------------------------------------------------------------- Limit (cost=60217.45..60217.53 rows=30 width=116) -> Sort (cost=60217.45..62717.45 rows=1000000 width=116) Sort Key: (random()) -> Seq Scan on test (cost=0.00..30683.00 rows=1000000 width=116) EXPLAIN SELECT * FROM test WHERE age < 0.5 ORDER BY random() LIMIT 30; QUERY PLAN --------------------------------------------------------------------------- Limit (cost=43861.14..43861.22 rows=30 width=116) -> Sort (cost=43861.14..44694.47 rows=333333 width=116) Sort Key: (random()) -> Seq Scan on test (cost=0.00..34016.33 rows=333333 width=116) Filter: ((age)::numeric < 0.5) EXPLAIN SELECT * FROM test WHERE name LIKE '%aaa%' ORDER BY random() LIMIT 30; QUERY PLAN -------------------------------------------------------------------------- Limit (cost=31006.58..31006.66 rows=30 width=116) -> Sort (cost=31006.58..31031.83 rows=10101 width=116) Sort Key: (random()) -> Seq Scan on test (cost=0.00..30708.25 rows=10101 width=116) Filter: ((name)::text ~~ '%aaa%'::text)
為什麼索引在這裡沒有效果?
1.表定義
更好的:
CREATE TABLE test( test_id serial PRIMARY KEY, age INT, name text, data JSONB );
由於類型和/的對齊要求,最好先將兩列放在一起。更多的:
integer``varchar``text``int
此外,“年齡”一開始是一個可疑的列。通常最好將“生日”或“創建日期”儲存為絕對資訊,而“年齡”很快就會腐爛。
2.測試數據
INSERT INTO test(name,age,data) SELECT name, age, row_to_json(sub)::jsonb FROM ( SELECT md5(g::text) AS name, (random() * 100)::int AS age FROM generate_series(1,1000000) g ) sub;
row_to_json()
比手工製作琴弦要優雅得多。- 如果你投到
random()
,integer
你得到0
or1
。對於任何類型的測試來說都是**無用的數據。**特別是,只有兩個不同的公共值的索引是無用的。乘以 100 得到一半有用的東西。3、為什麼不使用索引?
你的 btree 索引:
CREATE INDEX test_name ON test(name);
對於不是左錨定的查詢 ( )是無用的。詳情在這裡:
LIKE``name LIKE '%aaa%'
- PostgreSQL LIKE 查詢性能變化
- LIKE 是如何實現的?
- 在 PostgreSQL 中使用 LIKE、SIMILAR TO 或正則表達式進行模式匹配
- Postgres 中 LIKE 和 ~ 的區別
你的 btree 索引:
CREATE INDEX test_age on test(age);
是沒用的,因為您只有在您的列
0
並且通常僅在表的一小部分(〜低於5%,這取決於)預計符合條件時才使用idex。用更真實的數據再試一次。1``age
4、測試方法
EXPLAIN
只顯示查詢計劃。要獲得實際性能使用EXPLAIN ANALYZE
,或者要獲得最準確的總時間:EXPLAIN ( ANALYZE, TIMING OFF)
.一般建議
具有標量值的列通常在各個方面都比文件類型(如
jsonb
.jsonb
(pg 9.4) 已經收到了重大改進json
,但這仍然是正確的。對 json 對象內的值進行謂詞的查詢比對標量列的謂詞要慢。使用優化(功能)索引,jsonb
可以接近,但它永遠不會那麼快。