PostgreSQL 位串的索引(每個最多 20,000 位)
我正在建構一個包含化合物(數百萬行)的表,並且在這些化合物中,某些預定的特徵/片段被標記在一個固定長度的位串中。這個位串將有 2000 到 20000 位,需要做進一步的研究來確定更精確的數字。當搜尋具有某些特定特徵或缺乏特定特徵的化合物時,將在該位串的選定子集上進行搜尋。每次都可以是不同的子集。
是否有一種索引類型可以使這些搜尋在 PostgreSQL(9.6 或 10)中高效?
插入很少發生並且在批處理過程中完成,而搜尋是最常用的操作,並且最好應該快速並且沒有誤報或漏報。
對我來說,這模糊地聽起來像是 GIN 索引的工作,但我對這種索引類型的理解不足以確定是否真的如此。
實際上可能有另一種解決方案,那就是創建一個單獨的“fragment_index”表,帶有片段標識符(因為它們在位串中具有固定位置,它們也有一個數字標識符)+複合ID對。我擔心該表可能會變得巨大(20M 複合,平均 50 次命中片段 = 1G 行)和針對它的多個連接(每個片段一個),其中該連接還可以返回高達 80% 的與復合表匹配(在某些情況下這很可能)根本不會執行。
我將不勝感激得到任何建議,以實現這一目標。
更新:我嘗試了在帶有編碼短程式碼的 varchar 數組上使用 trigram 模組的 GIN 索引,它給出了混合結果,主要取決於過濾操作後剩餘的數據量。
為了給出具有任何意義的範例,我們假設該表如下所示:
CREATE TABLE testcompounds ( id serial primary key, cd_structure text, features_as_text varchar(128), features_as_bits bit varying(32) ); CREATE INDEX flags_testcompounds on testcompounds using gin (features_as_text gin_trgm_ops); CREATE TABLE fragments ( id serial primary key, smarts text, keystring varchar(4), frequency int ); insert into fragments (keystring,smarts) values('AAA', '*=O'); insert into fragments (keystring,smarts) values('AAB', '[#6]=O'); insert into fragments (keystring,smarts) values('AAC', '[#7]=O'); ... insert into fragments (keystring,smarts) values('AAN', '[#6]-F'); insert into fragments (keystring,smarts) values('AAO', '[#6]-Cl'); insert into fragments (keystring,smarts) values('AAP', '[#6]-Br'); ... etc.
並且 features_as_text 和 features_as_bits 欄位已完全填充。
可以在此執行的查詢範例如下:
select id, cd_structure from testcompounds where (features_as_bits & (B'00000000000000000000000000000001' << (2-1)) = (B'00000000000000000000000000000001' << (2-1))) AND (features_as_bits & (B'00000000000000000000000000000001' << (18-1)) = (B'00000000000000000000000000000000')) AND (features_as_bits & (B'00000000000000000000000000000001' << (19-1)) = (B'00000000000000000000000000000000')) AND (features_as_bits & (B'00000000000000000000000000000001' << (5-1)) = (B'00000000000000000000000000000000')) AND (features_as_bits & (B'00000000000000000000000000000001' << (6-1)) = (B'00000000000000000000000000000000')) AND (features_as_bits & (B'00000000000000000000000000000001' << (7-1)) = (B'00000000000000000000000000000000')) AND (features_as_bits & (B'00000000000000000000000000000001' << (8-1)) = (B'00000000000000000000000000000000')) AND (features_as_bits & (B'00000000000000000000000000000001' << (9-1)) = (B'00000000000000000000000000000000'))
換句話說:獲取具有特徵 2 且不具有任何特徵 18、19、5、6、7、8、9 的所有內容
我確實測試了一些按位運算的索引策略。過程有點長,不好意思。
首先,
- 這只是對我的RDS環境的一個測試。您的數據庫性能取決於您的環境和數據(CPU、記憶體、磁碟、表中的行數、列的基數等)
- 我不熟悉 PostgreSQL。特別是,我不確定Bloom索引的哪個位長最好。(有人可以幫我嗎?)
- 通常,較小的索引大小是首選。因此,表分區可能是一個很好的解決方案。
環境
- 版本:PostgreSQL 10.3
- 主機:Amazon RDS db.m4.large 的新實例(vCPU:2,記憶體:8GB,磁碟:GP2-SSD 100GB)
表/索引定義
創建擴展
CREATE EXTENSION IF NOT EXISTS intarray; CREATE EXTENSION IF NOT EXISTS bloom;
創建表
CREATE TABLE IF NOT EXISTS t_bitwise ( id BIGINT ,c_int INTEGER ,c_bit BIT(31) ,c_int_arr _int4 ,c_bool_0 BOOLEAN ,c_bool_1 BOOLEAN ,c_bool_2 BOOLEAN ,c_bool_3 BOOLEAN ,c_bool_4 BOOLEAN ,c_bool_5 BOOLEAN ,c_bool_6 BOOLEAN ,c_bool_7 BOOLEAN ,c_bool_8 BOOLEAN ,c_bool_9 BOOLEAN ,c_bool_10 BOOLEAN ,c_bool_11 BOOLEAN ,c_bool_12 BOOLEAN ,c_bool_13 BOOLEAN ,c_bool_14 BOOLEAN ,c_bool_15 BOOLEAN ,c_bool_16 BOOLEAN ,c_bool_17 BOOLEAN ,c_bool_18 BOOLEAN ,c_bool_19 BOOLEAN ,c_bool_20 BOOLEAN ,c_bool_21 BOOLEAN ,c_bool_22 BOOLEAN ,c_bool_23 BOOLEAN ,c_bool_24 BOOLEAN ,c_bool_25 BOOLEAN ,c_bool_26 BOOLEAN ,c_bool_27 BOOLEAN ,c_bool_28 BOOLEAN ,c_bool_29 BOOLEAN ,c_bool_30 BOOLEAN ,PRIMARY KEY (id) );
在每個布爾列上創建一個 btree 索引
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_1 ON t_bitwise (c_bool_1); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_2 ON t_bitwise (c_bool_2); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_3 ON t_bitwise (c_bool_3); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_4 ON t_bitwise (c_bool_4); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_5 ON t_bitwise (c_bool_5); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_6 ON t_bitwise (c_bool_6); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_7 ON t_bitwise (c_bool_7); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_8 ON t_bitwise (c_bool_8); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_9 ON t_bitwise (c_bool_9); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_10 ON t_bitwise (c_bool_10); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_11 ON t_bitwise (c_bool_11); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_12 ON t_bitwise (c_bool_12); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_13 ON t_bitwise (c_bool_13); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_14 ON t_bitwise (c_bool_14); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_15 ON t_bitwise (c_bool_15); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_16 ON t_bitwise (c_bool_16); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_17 ON t_bitwise (c_bool_17); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_18 ON t_bitwise (c_bool_18); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_19 ON t_bitwise (c_bool_19); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_20 ON t_bitwise (c_bool_20); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_21 ON t_bitwise (c_bool_21); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_22 ON t_bitwise (c_bool_22); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_23 ON t_bitwise (c_bool_23); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_24 ON t_bitwise (c_bool_24); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_25 ON t_bitwise (c_bool_25); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_26 ON t_bitwise (c_bool_26); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_27 ON t_bitwise (c_bool_27); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_28 ON t_bitwise (c_bool_28); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_29 ON t_bitwise (c_bool_29); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_30 ON t_bitwise (c_bool_30); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_31 ON t_bitwise (c_bool_31); CREATE INDEX IF NOT EXISTS idx_btree_on_bool_32 ON t_bitwise (c_bool_32);
在 bool 列上創建一個複合 btree 索引
CREATE INDEX IF NOT EXISTS idx_btree_on_composite_bool ON t_bitwise ( c_bool_32 ,c_bool_31 ,c_bool_30 ,c_bool_29 ,c_bool_28 ,c_bool_27 ,c_bool_26 ,c_bool_25 ,c_bool_24 ,c_bool_23 ,c_bool_22 ,c_bool_21 ,c_bool_20 ,c_bool_19 ,c_bool_18 ,c_bool_17 ,c_bool_16 ,c_bool_15 ,c_bool_14 ,c_bool_13 ,c_bool_12 ,c_bool_11 ,c_bool_10 ,c_bool_9 ,c_bool_8 ,c_bool_7 ,c_bool_6 ,c_bool_5 ,c_bool_4 ,c_bool_3 ,c_bool_2 ,c_bool_1 ) ;
在 bool 列上創建一個Bloom索引
CREATE INDEX IF NOT EXISTS idx_bloom_on_bool ON t_bitwise USING bloom ( CAST(c_bool_1 AS INTEGER) ,CAST(c_bool_2 AS INTEGER) ,CAST(c_bool_3 AS INTEGER) ,CAST(c_bool_4 AS INTEGER) ,CAST(c_bool_5 AS INTEGER) ,CAST(c_bool_6 AS INTEGER) ,CAST(c_bool_7 AS INTEGER) ,CAST(c_bool_8 AS INTEGER) ,CAST(c_bool_9 AS INTEGER) ,CAST(c_bool_10 AS INTEGER) ,CAST(c_bool_11 AS INTEGER) ,CAST(c_bool_12 AS INTEGER) ,CAST(c_bool_13 AS INTEGER) ,CAST(c_bool_14 AS INTEGER) ,CAST(c_bool_15 AS INTEGER) ,CAST(c_bool_16 AS INTEGER) ,CAST(c_bool_17 AS INTEGER) ,CAST(c_bool_18 AS INTEGER) ,CAST(c_bool_19 AS INTEGER) ,CAST(c_bool_20 AS INTEGER) ,CAST(c_bool_21 AS INTEGER) ,CAST(c_bool_22 AS INTEGER) ,CAST(c_bool_23 AS INTEGER) ,CAST(c_bool_24 AS INTEGER) ,CAST(c_bool_25 AS INTEGER) ,CAST(c_bool_26 AS INTEGER) ,CAST(c_bool_27 AS INTEGER) ,CAST(c_bool_28 AS INTEGER) ,CAST(c_bool_29 AS INTEGER) ,CAST(c_bool_30 AS INTEGER) ,CAST(c_bool_31 AS INTEGER) ,CAST(c_bool_32 AS INTEGER) ) WITH (length=128) ;
在 int 列上創建一個Bloom索引
CREATE OR REPLACE FUNCTION bitcheck_int(INTEGER, INTEGER) RETURNS INTEGER LANGUAGE SQL IMMUTABLE SECURITY INVOKER PARALLEL SAFE COST 0.01 AS $$ SELECT CASE WHEN $1 & (1 << ($2 - 1)) > 0 THEN $2 ELSE -$2 END $$ ; CREATE INDEX IF NOT EXISTS idx_bloom_on_int ON t_bitwise USING bloom ( bitcheck_int(c_int, 1) ,bitcheck_int(c_int, 2) ,bitcheck_int(c_int, 3) ,bitcheck_int(c_int, 4) ,bitcheck_int(c_int, 5) ,bitcheck_int(c_int, 6) ,bitcheck_int(c_int, 7) ,bitcheck_int(c_int, 8) ,bitcheck_int(c_int, 9) ,bitcheck_int(c_int, 10) ,bitcheck_int(c_int, 11) ,bitcheck_int(c_int, 12) ,bitcheck_int(c_int, 13) ,bitcheck_int(c_int, 14) ,bitcheck_int(c_int, 15) ,bitcheck_int(c_int, 16) ,bitcheck_int(c_int, 17) ,bitcheck_int(c_int, 18) ,bitcheck_int(c_int, 19) ,bitcheck_int(c_int, 20) ,bitcheck_int(c_int, 21) ,bitcheck_int(c_int, 22) ,bitcheck_int(c_int, 23) ,bitcheck_int(c_int, 24) ,bitcheck_int(c_int, 25) ,bitcheck_int(c_int, 26) ,bitcheck_int(c_int, 27) ,bitcheck_int(c_int, 28) ,bitcheck_int(c_int, 29) ,bitcheck_int(c_int, 20) ,bitcheck_int(c_int, 31) ,bitcheck_int(c_int, 32) ) WITH (length=128) ;
在位列上創建一個bloom索引
CREATE OR REPLACE FUNCTION bitcheck_bit(BIT(32), INTEGER) RETURNS INTEGER LANGUAGE SQL IMMUTABLE SECURITY INVOKER PARALLEL SAFE COST 0.01 AS $$ SELECT CASE WHEN $1::INTEGER & (1 << ($2 - 1)) > 0 THEN $2 ELSE -$2 END $$ ; CREATE INDEX IF NOT EXISTS idx_bloom_on_bit ON t_bitwise USING bloom ( bitcheck_bit(c_bit, 1) ,bitcheck_bit(c_bit, 2) ,bitcheck_bit(c_bit, 3) ,bitcheck_bit(c_bit, 4) ,bitcheck_bit(c_bit, 5) ,bitcheck_bit(c_bit, 6) ,bitcheck_bit(c_bit, 7) ,bitcheck_bit(c_bit, 8) ,bitcheck_bit(c_bit, 9) ,bitcheck_bit(c_bit, 10) ,bitcheck_bit(c_bit, 11) ,bitcheck_bit(c_bit, 12) ,bitcheck_bit(c_bit, 13) ,bitcheck_bit(c_bit, 14) ,bitcheck_bit(c_bit, 15) ,bitcheck_bit(c_bit, 16) ,bitcheck_bit(c_bit, 17) ,bitcheck_bit(c_bit, 18) ,bitcheck_bit(c_bit, 19) ,bitcheck_bit(c_bit, 20) ,bitcheck_bit(c_bit, 21) ,bitcheck_bit(c_bit, 22) ,bitcheck_bit(c_bit, 23) ,bitcheck_bit(c_bit, 24) ,bitcheck_bit(c_bit, 25) ,bitcheck_bit(c_bit, 26) ,bitcheck_bit(c_bit, 27) ,bitcheck_bit(c_bit, 28) ,bitcheck_bit(c_bit, 29) ,bitcheck_bit(c_bit, 20) ,bitcheck_bit(c_bit, 31) ,bitcheck_bit(c_bit, 32) ) WITH (length=128) ;
在 int 數組列上創建一個 GIN 索引
CREATE INDEX IF NOT EXISTS idx_gin_on_int_arr ON t_bitwise USING GIN(c_int_arr gin__int_ops) ;
在 int 列上創建一個 GIN int 索引
CREATE OR REPLACE FUNCTION convert_int_to_int_arr(INTEGER) RETURNS _int4 LANGUAGE SQL IMMUTABLE STRICT SECURITY INVOKER PARALLEL SAFE COST 0.01 AS $$ SELECT ARRAY( SELECT CASE WHEN (1 << (i - 1)) & $1 > 0 THEN i ELSE -i END FROM generate_series(1, 32) AS i ) $$ ; CREATE INDEX IF NOT EXISTS idx_gin_int_on_int ON t_bitwise USING GIN(convert_int_to_int_arr(c_int) gin__int_ops) ;
在位列上創建一個 GIN int 索引
CREATE OR REPLACE FUNCTION convert_bit_to_int_arr(BIT(32)) RETURNS _int4 LANGUAGE SQL IMMUTABLE STRICT SECURITY INVOKER PARALLEL SAFE COST 0.01 AS $$ SELECT ARRAY( SELECT CASE WHEN (1 << (i - 1)) & $1::INTEGER > 0 THEN i ELSE -i END FROM generate_series(1, 32) AS i ) $$ ; CREATE INDEX IF NOT EXISTS idx_gin_int_on_bit ON t_bitwise USING GIN(convert_bit_to_int_arr(c_bit) gin__int_ops) ;
插入
插入
這需要很長時間。
INSERT INTO t_bitwise ( id ,c_int ,c_bit ,c_int_arr ,c_bool_1 ,c_bool_2 ,c_bool_3 ,c_bool_4 ,c_bool_5 ,c_bool_6 ,c_bool_7 ,c_bool_8 ,c_bool_9 ,c_bool_10 ,c_bool_11 ,c_bool_12 ,c_bool_13 ,c_bool_14 ,c_bool_15 ,c_bool_16 ,c_bool_17 ,c_bool_18 ,c_bool_19 ,c_bool_20 ,c_bool_21 ,c_bool_22 ,c_bool_23 ,c_bool_24 ,c_bool_25 ,c_bool_26 ,c_bool_27 ,c_bool_28 ,c_bool_29 ,c_bool_30 ,c_bool_31 ,c_bool_32 ) SELECT id ,t.rand_int ,t.rand_int::BIT(32) /* ,ARRAY( SELECT (1 << (i - 1)) & t.rand_int FROM generate_series(1, 32) AS i WHERE ((1 << (i - 1)) & t.rand_int > 0) ) */ ,ARRAY( SELECT CASE WHEN (1 << (i - 1)) & t.rand_int > 0 THEN i ELSE -i END FROM generate_series(1, 32) AS i ) ,(1 << 0) & t.rand_int > 0 ,(1 << 1) & t.rand_int > 0 ,(1 << 2) & t.rand_int > 0 ,(1 << 3) & t.rand_int > 0 ,(1 << 4) & t.rand_int > 0 ,(1 << 5) & t.rand_int > 0 ,(1 << 6) & t.rand_int > 0 ,(1 << 7) & t.rand_int > 0 ,(1 << 8) & t.rand_int > 0 ,(1 << 9) & t.rand_int > 0 ,(1 << 10) & t.rand_int > 0 ,(1 << 11) & t.rand_int > 0 ,(1 << 12) & t.rand_int > 0 ,(1 << 13) & t.rand_int > 0 ,(1 << 14) & t.rand_int > 0 ,(1 << 15) & t.rand_int > 0 ,(1 << 16) & t.rand_int > 0 ,(1 << 17) & t.rand_int > 0 ,(1 << 18) & t.rand_int > 0 ,(1 << 19) & t.rand_int > 0 ,(1 << 20) & t.rand_int > 0 ,(1 << 21) & t.rand_int > 0 ,(1 << 22) & t.rand_int > 0 ,(1 << 23) & t.rand_int > 0 ,(1 << 24) & t.rand_int > 0 ,(1 << 25) & t.rand_int > 0 ,(1 << 26) & t.rand_int > 0 ,(1 << 27) & t.rand_int > 0 ,(1 << 28) & t.rand_int > 0 ,(1 << 29) & t.rand_int > 0 ,(1 << 30) & t.rand_int > 0 ,(1 << 31) & t.rand_int > 0 FROM ( SELECT id ,(random() * 4294967295)::BIGINT::BIT(32)::INTEGER AS rand_int FROM generate_series(1, 30000000) AS id ) AS t ;
分析
SELECT gin_clean_pending_list('idx_gin_on_int_arr') ,gin_clean_pending_list('idx_gin_int_on_int') ,gin_clean_pending_list('idx_gin_int_on_bit') ; VACUUM FULL ; ANALYZE t_bitwise ; SELECT COUNT(*) FROM t_bitwise ;
索引大小
SELECT indexname, pg_size_pretty(pg_relation_size(quote_ident(t.tablename)::text)) AS table_size, pg_size_pretty(pg_relation_size(quote_ident(indexrelname)::text)) AS index_size FROM pg_tables t LEFT OUTER JOIN pg_class c ON t.tablename = c.relname LEFT OUTER JOIN ( SELECT c.relname AS ctablename, ipg.relname AS indexname, indexrelname FROM pg_index x JOIN pg_class c ON c.oid = x.indrelid JOIN pg_class ipg ON ipg.oid = x.indexrelid JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid ) AS foo ON t.tablename = foo.ctablename WHERE t.schemaname = 'public' AND t.tablename = 't_bitwise' AND indexname != 't_bitwise_pkey' ;
結果
indexname | table_size | index_size -----------------------------+------------+------------ idx_btree_on_bool_1 | 6893 MB | 643 MB idx_btree_on_bool_2 | 6893 MB | 643 MB idx_btree_on_bool_3 | 6893 MB | 643 MB idx_btree_on_bool_4 | 6893 MB | 643 MB idx_btree_on_bool_5 | 6893 MB | 643 MB idx_btree_on_bool_6 | 6893 MB | 643 MB idx_btree_on_bool_7 | 6893 MB | 643 MB idx_btree_on_bool_8 | 6893 MB | 643 MB idx_btree_on_bool_9 | 6893 MB | 643 MB idx_btree_on_bool_10 | 6893 MB | 643 MB idx_btree_on_bool_11 | 6893 MB | 643 MB idx_btree_on_bool_12 | 6893 MB | 643 MB idx_btree_on_bool_13 | 6893 MB | 643 MB idx_btree_on_bool_14 | 6893 MB | 643 MB idx_btree_on_bool_15 | 6893 MB | 643 MB idx_btree_on_bool_16 | 6893 MB | 643 MB idx_btree_on_bool_17 | 6893 MB | 643 MB idx_btree_on_bool_18 | 6893 MB | 643 MB idx_btree_on_bool_19 | 6893 MB | 643 MB idx_btree_on_bool_20 | 6893 MB | 643 MB idx_btree_on_bool_21 | 6893 MB | 643 MB idx_btree_on_bool_22 | 6893 MB | 643 MB idx_btree_on_bool_23 | 6893 MB | 643 MB idx_btree_on_bool_24 | 6893 MB | 643 MB idx_btree_on_bool_25 | 6893 MB | 643 MB idx_btree_on_bool_26 | 6893 MB | 643 MB idx_btree_on_bool_27 | 6893 MB | 643 MB idx_btree_on_bool_28 | 6893 MB | 643 MB idx_btree_on_bool_29 | 6893 MB | 643 MB idx_btree_on_bool_30 | 6893 MB | 643 MB idx_btree_on_bool_31 | 6893 MB | 643 MB idx_btree_on_bool_32 | 6893 MB | 643 MB idx_btree_on_composite_bool | 6893 MB | 1423 MB idx_bloom_on_bool | 6893 MB | 633 MB idx_bloom_on_int | 6893 MB | 633 MB idx_bloom_on_bit | 6893 MB | 633 MB idx_gin_on_int_arr | 6893 MB | 1030 MB idx_gin_int_on_int | 6893 MB | 1030 MB idx_gin_int_on_bit | 6893 MB | 1030 MB
測試sql
所有執行時間結果均取自第二個查詢結果,因為第一個查詢通常需要一些時間將索引載入到記憶體中。
對 int 列進行全掃描
1位過濾(Parallel Seq Scan,執行時間:122930.731 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE c_int & 1 = 1 ;
16位濾波(並行序列掃描,122896.131 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE c_int & 255 = 255 AND (c_int >> 8) & 255 = 0 ;
bool 列上的 btree 索引
1位過濾(Seq Scan,執行時間:122853.069 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE c_bool_1 IS TRUE ;
16位濾波(Parallel Seq,執行時間:122834.960 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE c_bool_1 IS TRUE AND c_bool_2 IS TRUE AND c_bool_3 IS TRUE AND c_bool_4 IS TRUE AND c_bool_5 IS TRUE AND c_bool_6 IS TRUE AND c_bool_7 IS TRUE AND c_bool_8 IS TRUE AND c_bool_9 IS FALSE AND c_bool_10 IS FALSE AND c_bool_11 IS FALSE AND c_bool_12 IS FALSE AND c_bool_13 IS FALSE AND c_bool_14 IS FALSE AND c_bool_15 IS FALSE AND c_bool_16 IS FALSE ;
bool 列上的複合 btree 索引
16位過濾(使用idx_btree_on_composite_bool,執行時間:293.317 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE c_bool_32 IN (TRUE, FALSE) AND c_bool_31 IN (TRUE, FALSE) AND c_bool_30 IN (TRUE, FALSE) AND c_bool_29 IN (TRUE, FALSE) AND c_bool_28 IN (TRUE, FALSE) AND c_bool_27 IN (TRUE, FALSE) AND c_bool_26 IN (TRUE, FALSE) AND c_bool_25 IN (TRUE, FALSE) AND c_bool_24 IN (TRUE, FALSE) AND c_bool_23 IN (TRUE, FALSE) AND c_bool_22 IN (TRUE, FALSE) AND c_bool_21 IN (TRUE, FALSE) AND c_bool_20 IN (TRUE, FALSE) AND c_bool_19 IN (TRUE, FALSE) AND c_bool_18 IN (TRUE, FALSE) AND c_bool_17 IN (TRUE, FALSE) AND c_bool_16 IS FALSE AND c_bool_15 IS FALSE AND c_bool_14 IS FALSE AND c_bool_13 IS FALSE AND c_bool_12 IS FALSE AND c_bool_11 IS FALSE AND c_bool_10 IS FALSE AND c_bool_9 IS FALSE AND c_bool_8 IS TRUE AND c_bool_7 IS TRUE AND c_bool_6 IS TRUE AND c_bool_5 IS TRUE AND c_bool_4 IS TRUE AND c_bool_3 IS TRUE AND c_bool_2 IS TRUE AND c_bool_1 IS TRUE ;
bool 列上的Bloom索引
1位過濾(Seq Scan,執行時間:122726.850 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE CAST(c_bool_1 AS INTEGER) = 1 ;
16位過濾(使用idx_bloom_on_bool,執行時間:373.581 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE CAST(c_bool_1 AS INTEGER) = 1 AND CAST(c_bool_2 AS INTEGER) = 1 AND CAST(c_bool_3 AS INTEGER) = 1 AND CAST(c_bool_4 AS INTEGER) = 1 AND CAST(c_bool_5 AS INTEGER) = 1 AND CAST(c_bool_6 AS INTEGER) = 1 AND CAST(c_bool_7 AS INTEGER) = 1 AND CAST(c_bool_8 AS INTEGER) = 1 AND CAST(c_bool_9 AS INTEGER) = 0 AND CAST(c_bool_10 AS INTEGER) = 0 AND CAST(c_bool_11 AS INTEGER) = 0 AND CAST(c_bool_12 AS INTEGER) = 0 AND CAST(c_bool_13 AS INTEGER) = 0 AND CAST(c_bool_14 AS INTEGER) = 0 AND CAST(c_bool_15 AS INTEGER) = 0 AND CAST(c_bool_16 AS INTEGER) = 0 ;
int 列上的Bloom索引
1位過濾(Seq Scan,執行時間:122660.620 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE bitcheck_int(c_int, 1) = 1 ;
16位過濾(使用idx_bloom_on_int,執行時間:391.335 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE bitcheck_int(c_int, 1) = 1 AND bitcheck_int(c_int, 2) = 2 AND bitcheck_int(c_int, 3) = 3 AND bitcheck_int(c_int, 4) = 4 AND bitcheck_int(c_int, 5) = 5 AND bitcheck_int(c_int, 6) = 6 AND bitcheck_int(c_int, 7) = 7 AND bitcheck_int(c_int, 8) = 8 AND bitcheck_int(c_int, 9) = -9 AND bitcheck_int(c_int, 10) = -10 AND bitcheck_int(c_int, 11) = -11 AND bitcheck_int(c_int, 12) = -12 AND bitcheck_int(c_int, 13) = -13 AND bitcheck_int(c_int, 14) = -14 AND bitcheck_int(c_int, 15) = -15 AND bitcheck_int(c_int, 16) = -16 ;
位列上的Bloom索引
1位過濾(Seq Scan,執行時間:122434.644 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE bitcheck_bit(c_bit, 1) = 1 ;
16位過濾(使用idx_bloom_on_bit,執行時間:397.157 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE bitcheck_bit(c_bit, 1) = 1 AND bitcheck_bit(c_bit, 2) = 2 AND bitcheck_bit(c_bit, 3) = 3 AND bitcheck_bit(c_bit, 4) = 4 AND bitcheck_bit(c_bit, 5) = 5 AND bitcheck_bit(c_bit, 6) = 6 AND bitcheck_bit(c_bit, 7) = 7 AND bitcheck_bit(c_bit, 8) = 8 AND bitcheck_bit(c_bit, 9) = -9 AND bitcheck_bit(c_bit, 10) = -10 AND bitcheck_bit(c_bit, 11) = -11 AND bitcheck_bit(c_bit, 12) = -12 AND bitcheck_bit(c_bit, 13) = -13 AND bitcheck_bit(c_bit, 14) = -14 AND bitcheck_bit(c_bit, 15) = -15 AND bitcheck_bit(c_bit, 16) = -16 ;
int 數組上的 GIN 索引
1位過濾(使用idx_gin_on_int_arr,執行時間:440942.779 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE c_int_arr @> (ARRAY[1])::_int4 ;
16位過濾(使用idx_gin_on_int_arr,執行時間:15322.953 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE c_int_arr @> (ARRAY[1,2,3,4,5,6,7,8,-9,-10,-11,-12,-13,-14,-15,-16])::_int4 ;
GIN int int 索引
1位過濾(使用idx_gin_int_on_int,執行時間:489909.621 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE convert_int_to_int_arr(c_int) @> (ARRAY[1])::_int4 ;
16位過濾(使用idx_gin_int_on_int,執行時間:15259.772 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE convert_int_to_int_arr(c_int) @> (ARRAY[1,2,3,4,5,6,7,8,-9,-10,-11,-12,-13,-14,-15,-16])::_int4 ;
GIN int 位索引
1位過濾(使用idx_gin_int_on_bit,執行時間:506071.625 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE convert_bit_to_int_arr(c_bit) @> (ARRAY[1])::_int4 ;
16位過濾(使用idx_gin_int_on_bit,執行時間:15292.945 ms)
EXPLAIN ANALYZE SELECT 1 FROM t_bitwise WHERE convert_bit_to_int_arr(c_bit) @> (ARRAY[1,2,3,4,5,6,7,8,-9,-10,-11,-12,-13,-14,-15,-16])::_int4 ;
我的想法
- Bloom 索引似乎比其他索引更好,因為它的索引大小更小,執行時間也不錯。(但無論如何,在低基數的情況下,任何查詢計劃都會導致全掃描)
- 您的專欄是 20000 位!我無法測試它。如果要使用bloom index,則必須將其拆分,因為bloom index最多可以有32個普通列或表達式列。