Postgresql

PostgreSQL 位串的索引(每個最多 20,000 位)

  • January 7, 2022

我正在建構一個包含化合物(數百萬行)的表,並且在這些化合物中,某些預定的特徵/片段被標記在一個固定長度的位串中。這個位串將有 2000 到 20000 位,需要做進一步的研究來確定更精確的數字。當搜尋具有某些特定特徵或缺乏特定特徵的化合物時,將在該位串的選定子集上進行搜尋。每次都可以是不同的子集。

是否有一種索引類型可以使這些搜尋在 PostgreSQL(9.6 或 10)中高效?

插入很少發生並且在批處理過程中完成,而搜尋是最常用的操作,並且最好應該快速並且沒有誤報或漏報。

對我來說,這模糊地聽起來像是 GIN 索引的工作,但我對這種索引類型的理解不足以確定是否真的如此。

實際上可能有另一種解決方案,那就是創建一個單獨的“fragment_index”表,帶有片段標識符(因為它們在位串中具有固定位置,它們也有一個數字標識符)+複合ID對。我擔心該表可能會變得巨大(20M 複合,平均 50 次命中片段 = 1G 行)和針對它的多個連接(每個片段一個),其中該連接還可以返回高達 80% 的與復合表匹配(在某些情況下這很可能)根本不會執行。

我將不勝感激得到任何建議,以實現這一目標。

更新:我嘗試了在帶有編碼短程式碼的 varchar 數組上使用 trigram 模組的 GIN 索引,它給出了混合結果,主要取決於過濾操作後剩餘的數據量。

為了給出具有任何意義的範例,我們假設該表如下所示:

CREATE TABLE testcompounds (
id serial primary key,
cd_structure text,
features_as_text varchar(128),
features_as_bits bit varying(32)
);

CREATE INDEX flags_testcompounds on testcompounds using gin (features_as_text gin_trgm_ops);


CREATE TABLE fragments (
id serial primary key,
smarts text,
keystring varchar(4),
frequency int 
);


insert into fragments (keystring,smarts) values('AAA', '*=O');
insert into fragments (keystring,smarts) values('AAB', '[#6]=O');
insert into fragments (keystring,smarts) values('AAC', '[#7]=O');
...
insert into fragments (keystring,smarts) values('AAN', '[#6]-F');
insert into fragments (keystring,smarts) values('AAO', '[#6]-Cl');
insert into fragments (keystring,smarts) values('AAP', '[#6]-Br');
...
etc.

並且 features_as_text 和 features_as_bits 欄位已完全填充。

可以在此執行的查詢範例如下:

select id, cd_structure from testcompounds 
where 
   (features_as_bits & (B'00000000000000000000000000000001' << (2-1)) = (B'00000000000000000000000000000001' << (2-1))) 

AND (features_as_bits & (B'00000000000000000000000000000001' << (18-1)) = (B'00000000000000000000000000000000')) 
AND (features_as_bits & (B'00000000000000000000000000000001' << (19-1)) = (B'00000000000000000000000000000000')) 
AND (features_as_bits & (B'00000000000000000000000000000001' << (5-1)) = (B'00000000000000000000000000000000')) 
AND (features_as_bits & (B'00000000000000000000000000000001' << (6-1)) = (B'00000000000000000000000000000000')) 
AND (features_as_bits & (B'00000000000000000000000000000001' << (7-1)) = (B'00000000000000000000000000000000')) 
AND (features_as_bits & (B'00000000000000000000000000000001' << (8-1)) = (B'00000000000000000000000000000000')) 
AND (features_as_bits & (B'00000000000000000000000000000001' << (9-1)) = (B'00000000000000000000000000000000')) 

換句話說:獲取具有特徵 2 且不具有任何特徵 18、19、5、6、7、8、9 的所有內容

我確實測試了一些按位運算的索引策略。過程有點長,不好意思。

首先,

  • 這只是對我的RDS環境的一個測試。您的數據庫性能取決於您的環境和數據(CPU、記憶體、磁碟、表中的行數、列的基數等)
  • 我不熟悉 PostgreSQL。特別是,我不確定Bloom索引的哪個位長最好。(有人可以幫我嗎?)
  • 通常,較小的索引大小是首選。因此,表分區可能是一個很好的解決方案。

環境


  • 版本:PostgreSQL 10.3
  • 主機:Amazon RDS db.m4.large 的新實例(vCPU:2,記憶體:8GB,磁碟:GP2-SSD 100GB)

表/索引定義


創建擴展

CREATE EXTENSION IF NOT EXISTS intarray;
CREATE EXTENSION IF NOT EXISTS bloom;

創建表

CREATE TABLE IF NOT EXISTS t_bitwise (
 id BIGINT
 ,c_int INTEGER
 ,c_bit BIT(31)
 ,c_int_arr _int4
 ,c_bool_0 BOOLEAN ,c_bool_1 BOOLEAN ,c_bool_2 BOOLEAN ,c_bool_3 BOOLEAN ,c_bool_4 BOOLEAN
 ,c_bool_5 BOOLEAN ,c_bool_6 BOOLEAN ,c_bool_7 BOOLEAN ,c_bool_8 BOOLEAN ,c_bool_9 BOOLEAN
 ,c_bool_10 BOOLEAN ,c_bool_11 BOOLEAN ,c_bool_12 BOOLEAN ,c_bool_13 BOOLEAN ,c_bool_14 BOOLEAN
 ,c_bool_15 BOOLEAN ,c_bool_16 BOOLEAN ,c_bool_17 BOOLEAN ,c_bool_18 BOOLEAN ,c_bool_19 BOOLEAN
 ,c_bool_20 BOOLEAN ,c_bool_21 BOOLEAN ,c_bool_22 BOOLEAN ,c_bool_23 BOOLEAN ,c_bool_24 BOOLEAN
 ,c_bool_25 BOOLEAN ,c_bool_26 BOOLEAN ,c_bool_27 BOOLEAN ,c_bool_28 BOOLEAN ,c_bool_29 BOOLEAN
 ,c_bool_30 BOOLEAN
 ,PRIMARY KEY (id)
);

在每個布爾列上創建一個 btree 索引

CREATE INDEX IF NOT EXISTS idx_btree_on_bool_1 ON t_bitwise (c_bool_1);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_2 ON t_bitwise (c_bool_2);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_3 ON t_bitwise (c_bool_3);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_4 ON t_bitwise (c_bool_4);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_5 ON t_bitwise (c_bool_5);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_6 ON t_bitwise (c_bool_6);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_7 ON t_bitwise (c_bool_7);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_8 ON t_bitwise (c_bool_8);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_9 ON t_bitwise (c_bool_9);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_10 ON t_bitwise (c_bool_10);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_11 ON t_bitwise (c_bool_11);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_12 ON t_bitwise (c_bool_12);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_13 ON t_bitwise (c_bool_13);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_14 ON t_bitwise (c_bool_14);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_15 ON t_bitwise (c_bool_15);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_16 ON t_bitwise (c_bool_16);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_17 ON t_bitwise (c_bool_17);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_18 ON t_bitwise (c_bool_18);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_19 ON t_bitwise (c_bool_19);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_20 ON t_bitwise (c_bool_20);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_21 ON t_bitwise (c_bool_21);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_22 ON t_bitwise (c_bool_22);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_23 ON t_bitwise (c_bool_23);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_24 ON t_bitwise (c_bool_24);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_25 ON t_bitwise (c_bool_25);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_26 ON t_bitwise (c_bool_26);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_27 ON t_bitwise (c_bool_27);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_28 ON t_bitwise (c_bool_28);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_29 ON t_bitwise (c_bool_29);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_30 ON t_bitwise (c_bool_30);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_31 ON t_bitwise (c_bool_31);
CREATE INDEX IF NOT EXISTS idx_btree_on_bool_32 ON t_bitwise (c_bool_32);

在 bool 列上創建一個複合 btree 索引

CREATE INDEX IF NOT EXISTS idx_btree_on_composite_bool ON t_bitwise (
 c_bool_32 ,c_bool_31
 ,c_bool_30 ,c_bool_29 ,c_bool_28 ,c_bool_27 ,c_bool_26 ,c_bool_25 ,c_bool_24 ,c_bool_23 ,c_bool_22 ,c_bool_21
 ,c_bool_20 ,c_bool_19 ,c_bool_18 ,c_bool_17 ,c_bool_16 ,c_bool_15 ,c_bool_14 ,c_bool_13 ,c_bool_12 ,c_bool_11
 ,c_bool_10 ,c_bool_9 ,c_bool_8 ,c_bool_7 ,c_bool_6 ,c_bool_5 ,c_bool_4 ,c_bool_3 ,c_bool_2 ,c_bool_1
)
;

在 bool 列上創建一個Bloom索引

CREATE INDEX IF NOT EXISTS idx_bloom_on_bool ON t_bitwise
 USING bloom (
   CAST(c_bool_1 AS INTEGER) ,CAST(c_bool_2 AS INTEGER) ,CAST(c_bool_3 AS INTEGER) ,CAST(c_bool_4 AS INTEGER) ,CAST(c_bool_5 AS INTEGER)
   ,CAST(c_bool_6 AS INTEGER) ,CAST(c_bool_7 AS INTEGER) ,CAST(c_bool_8 AS INTEGER) ,CAST(c_bool_9 AS INTEGER) ,CAST(c_bool_10 AS INTEGER)
   ,CAST(c_bool_11 AS INTEGER) ,CAST(c_bool_12 AS INTEGER) ,CAST(c_bool_13 AS INTEGER) ,CAST(c_bool_14 AS INTEGER) ,CAST(c_bool_15 AS INTEGER)
   ,CAST(c_bool_16 AS INTEGER) ,CAST(c_bool_17 AS INTEGER) ,CAST(c_bool_18 AS INTEGER) ,CAST(c_bool_19 AS INTEGER) ,CAST(c_bool_20 AS INTEGER)
   ,CAST(c_bool_21 AS INTEGER) ,CAST(c_bool_22 AS INTEGER) ,CAST(c_bool_23 AS INTEGER) ,CAST(c_bool_24 AS INTEGER) ,CAST(c_bool_25 AS INTEGER)
   ,CAST(c_bool_26 AS INTEGER) ,CAST(c_bool_27 AS INTEGER) ,CAST(c_bool_28 AS INTEGER) ,CAST(c_bool_29 AS INTEGER) ,CAST(c_bool_30 AS INTEGER)
   ,CAST(c_bool_31 AS INTEGER) ,CAST(c_bool_32 AS INTEGER)
 ) WITH (length=128)
;

在 int 列上創建一個Bloom索引

CREATE OR REPLACE FUNCTION bitcheck_int(INTEGER, INTEGER)
RETURNS INTEGER
LANGUAGE SQL
IMMUTABLE
SECURITY INVOKER
PARALLEL SAFE
COST 0.01
AS $$
 SELECT CASE
   WHEN $1 & (1 << ($2 - 1)) > 0 THEN $2
   ELSE -$2
 END
$$
;
CREATE INDEX IF NOT EXISTS idx_bloom_on_int ON t_bitwise
 USING bloom (
   bitcheck_int(c_int, 1)
   ,bitcheck_int(c_int, 2)
   ,bitcheck_int(c_int, 3)
   ,bitcheck_int(c_int, 4)
   ,bitcheck_int(c_int, 5)
   ,bitcheck_int(c_int, 6)
   ,bitcheck_int(c_int, 7)
   ,bitcheck_int(c_int, 8)
   ,bitcheck_int(c_int, 9)
   ,bitcheck_int(c_int, 10)
   ,bitcheck_int(c_int, 11)
   ,bitcheck_int(c_int, 12)
   ,bitcheck_int(c_int, 13)
   ,bitcheck_int(c_int, 14)
   ,bitcheck_int(c_int, 15)
   ,bitcheck_int(c_int, 16)
   ,bitcheck_int(c_int, 17)
   ,bitcheck_int(c_int, 18)
   ,bitcheck_int(c_int, 19)
   ,bitcheck_int(c_int, 20)
   ,bitcheck_int(c_int, 21)
   ,bitcheck_int(c_int, 22)
   ,bitcheck_int(c_int, 23)
   ,bitcheck_int(c_int, 24)
   ,bitcheck_int(c_int, 25)
   ,bitcheck_int(c_int, 26)
   ,bitcheck_int(c_int, 27)
   ,bitcheck_int(c_int, 28)
   ,bitcheck_int(c_int, 29)
   ,bitcheck_int(c_int, 20)
   ,bitcheck_int(c_int, 31)
   ,bitcheck_int(c_int, 32)
 ) WITH (length=128)
;

在位列上創建一個bloom索引

CREATE OR REPLACE FUNCTION bitcheck_bit(BIT(32), INTEGER)
RETURNS INTEGER
LANGUAGE SQL
IMMUTABLE
SECURITY INVOKER
PARALLEL SAFE
COST 0.01
AS $$
 SELECT CASE
   WHEN $1::INTEGER & (1 << ($2 - 1)) > 0 THEN $2
   ELSE -$2
 END
$$
;
CREATE INDEX IF NOT EXISTS idx_bloom_on_bit ON t_bitwise
 USING bloom (
   bitcheck_bit(c_bit, 1)
   ,bitcheck_bit(c_bit, 2)
   ,bitcheck_bit(c_bit, 3)
   ,bitcheck_bit(c_bit, 4)
   ,bitcheck_bit(c_bit, 5)
   ,bitcheck_bit(c_bit, 6)
   ,bitcheck_bit(c_bit, 7)
   ,bitcheck_bit(c_bit, 8)
   ,bitcheck_bit(c_bit, 9)
   ,bitcheck_bit(c_bit, 10)
   ,bitcheck_bit(c_bit, 11)
   ,bitcheck_bit(c_bit, 12)
   ,bitcheck_bit(c_bit, 13)
   ,bitcheck_bit(c_bit, 14)
   ,bitcheck_bit(c_bit, 15)
   ,bitcheck_bit(c_bit, 16)
   ,bitcheck_bit(c_bit, 17)
   ,bitcheck_bit(c_bit, 18)
   ,bitcheck_bit(c_bit, 19)
   ,bitcheck_bit(c_bit, 20)
   ,bitcheck_bit(c_bit, 21)
   ,bitcheck_bit(c_bit, 22)
   ,bitcheck_bit(c_bit, 23)
   ,bitcheck_bit(c_bit, 24)
   ,bitcheck_bit(c_bit, 25)
   ,bitcheck_bit(c_bit, 26)
   ,bitcheck_bit(c_bit, 27)
   ,bitcheck_bit(c_bit, 28)
   ,bitcheck_bit(c_bit, 29)
   ,bitcheck_bit(c_bit, 20)
   ,bitcheck_bit(c_bit, 31)
   ,bitcheck_bit(c_bit, 32)
 ) WITH (length=128)
;

在 int 數組列上創建一個 GIN 索引

CREATE INDEX IF NOT EXISTS idx_gin_on_int_arr ON t_bitwise
 USING GIN(c_int_arr gin__int_ops)
;

在 int 列上創建一個 GIN int 索引

CREATE OR REPLACE FUNCTION convert_int_to_int_arr(INTEGER)
RETURNS _int4
LANGUAGE SQL
IMMUTABLE
STRICT
SECURITY INVOKER
PARALLEL SAFE
COST 0.01
AS $$
 SELECT ARRAY(
   SELECT
     CASE
       WHEN (1 << (i - 1)) & $1 > 0 THEN i
       ELSE -i
     END
   FROM generate_series(1, 32) AS i
 )
$$
;
CREATE INDEX IF NOT EXISTS idx_gin_int_on_int ON t_bitwise
 USING GIN(convert_int_to_int_arr(c_int) gin__int_ops)
;

在位列上創建一個 GIN int 索引

CREATE OR REPLACE FUNCTION convert_bit_to_int_arr(BIT(32))
RETURNS _int4
LANGUAGE SQL
IMMUTABLE
STRICT
SECURITY INVOKER
PARALLEL SAFE
COST 0.01
AS $$
 SELECT ARRAY(
   SELECT
     CASE
       WHEN (1 << (i - 1)) & $1::INTEGER > 0 THEN i
       ELSE -i
     END
   FROM generate_series(1, 32) AS i
 )
$$
;
CREATE INDEX IF NOT EXISTS idx_gin_int_on_bit ON t_bitwise
 USING GIN(convert_bit_to_int_arr(c_bit) gin__int_ops)
;

插入


插入

這需要很長時間。

INSERT INTO t_bitwise (
 id ,c_int ,c_bit ,c_int_arr
 ,c_bool_1 ,c_bool_2 ,c_bool_3 ,c_bool_4 ,c_bool_5 ,c_bool_6 ,c_bool_7 ,c_bool_8 ,c_bool_9 ,c_bool_10
 ,c_bool_11 ,c_bool_12 ,c_bool_13 ,c_bool_14 ,c_bool_15 ,c_bool_16 ,c_bool_17 ,c_bool_18 ,c_bool_19 ,c_bool_20
 ,c_bool_21 ,c_bool_22 ,c_bool_23 ,c_bool_24 ,c_bool_25 ,c_bool_26 ,c_bool_27 ,c_bool_28 ,c_bool_29 ,c_bool_30
 ,c_bool_31 ,c_bool_32
) SELECT
 id
 ,t.rand_int
 ,t.rand_int::BIT(32)
 /*
 ,ARRAY(
   SELECT
       (1 << (i - 1)) & t.rand_int
   FROM generate_series(1, 32) AS i
   WHERE
     ((1 << (i - 1)) & t.rand_int > 0)
 )
 */
 ,ARRAY(
   SELECT
       CASE
       WHEN (1 << (i - 1)) & t.rand_int > 0 THEN i
       ELSE -i
     END
   FROM generate_series(1, 32) AS i
 )
 ,(1 << 0) & t.rand_int > 0 ,(1 << 1) & t.rand_int > 0 ,(1 << 2) & t.rand_int > 0 ,(1 << 3) & t.rand_int > 0
 ,(1 << 4) & t.rand_int > 0 ,(1 << 5) & t.rand_int > 0 ,(1 << 6) & t.rand_int > 0 ,(1 << 7) & t.rand_int > 0
 ,(1 << 8) & t.rand_int > 0 ,(1 << 9) & t.rand_int > 0 ,(1 << 10) & t.rand_int > 0 ,(1 << 11) & t.rand_int > 0
 ,(1 << 12) & t.rand_int > 0 ,(1 << 13) & t.rand_int > 0 ,(1 << 14) & t.rand_int > 0 ,(1 << 15) & t.rand_int > 0
 ,(1 << 16) & t.rand_int > 0 ,(1 << 17) & t.rand_int > 0 ,(1 << 18) & t.rand_int > 0 ,(1 << 19) & t.rand_int > 0
 ,(1 << 20) & t.rand_int > 0 ,(1 << 21) & t.rand_int > 0 ,(1 << 22) & t.rand_int > 0 ,(1 << 23) & t.rand_int > 0
 ,(1 << 24) & t.rand_int > 0 ,(1 << 25) & t.rand_int > 0 ,(1 << 26) & t.rand_int > 0 ,(1 << 27) & t.rand_int > 0
 ,(1 << 28) & t.rand_int > 0 ,(1 << 29) & t.rand_int > 0 ,(1 << 30) & t.rand_int > 0 ,(1 << 31) & t.rand_int > 0
FROM (
 SELECT
   id
   ,(random() * 4294967295)::BIGINT::BIT(32)::INTEGER AS rand_int
 FROM generate_series(1, 30000000) AS id
) AS t
;

分析

SELECT
 gin_clean_pending_list('idx_gin_on_int_arr')
 ,gin_clean_pending_list('idx_gin_int_on_int')
 ,gin_clean_pending_list('idx_gin_int_on_bit')
;
VACUUM FULL
;
ANALYZE t_bitwise
;
SELECT COUNT(*) FROM t_bitwise
;

索引大小

SELECT
   indexname,
   pg_size_pretty(pg_relation_size(quote_ident(t.tablename)::text)) AS table_size,
   pg_size_pretty(pg_relation_size(quote_ident(indexrelname)::text)) AS index_size
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename = c.relname
LEFT OUTER JOIN (
 SELECT
   c.relname AS ctablename, ipg.relname AS indexname, indexrelname
 FROM pg_index x
 JOIN pg_class c ON c.oid = x.indrelid
 JOIN pg_class ipg ON ipg.oid = x.indexrelid
 JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid
) AS foo ON t.tablename = foo.ctablename
WHERE t.schemaname = 'public' AND t.tablename = 't_bitwise' AND indexname != 't_bitwise_pkey'
;

結果

         indexname          | table_size | index_size 
-----------------------------+------------+------------
idx_btree_on_bool_1         | 6893 MB    | 643 MB
idx_btree_on_bool_2         | 6893 MB    | 643 MB
idx_btree_on_bool_3         | 6893 MB    | 643 MB
idx_btree_on_bool_4         | 6893 MB    | 643 MB
idx_btree_on_bool_5         | 6893 MB    | 643 MB
idx_btree_on_bool_6         | 6893 MB    | 643 MB
idx_btree_on_bool_7         | 6893 MB    | 643 MB
idx_btree_on_bool_8         | 6893 MB    | 643 MB
idx_btree_on_bool_9         | 6893 MB    | 643 MB
idx_btree_on_bool_10        | 6893 MB    | 643 MB
idx_btree_on_bool_11        | 6893 MB    | 643 MB
idx_btree_on_bool_12        | 6893 MB    | 643 MB
idx_btree_on_bool_13        | 6893 MB    | 643 MB
idx_btree_on_bool_14        | 6893 MB    | 643 MB
idx_btree_on_bool_15        | 6893 MB    | 643 MB
idx_btree_on_bool_16        | 6893 MB    | 643 MB
idx_btree_on_bool_17        | 6893 MB    | 643 MB
idx_btree_on_bool_18        | 6893 MB    | 643 MB
idx_btree_on_bool_19        | 6893 MB    | 643 MB
idx_btree_on_bool_20        | 6893 MB    | 643 MB
idx_btree_on_bool_21        | 6893 MB    | 643 MB
idx_btree_on_bool_22        | 6893 MB    | 643 MB
idx_btree_on_bool_23        | 6893 MB    | 643 MB
idx_btree_on_bool_24        | 6893 MB    | 643 MB
idx_btree_on_bool_25        | 6893 MB    | 643 MB
idx_btree_on_bool_26        | 6893 MB    | 643 MB
idx_btree_on_bool_27        | 6893 MB    | 643 MB
idx_btree_on_bool_28        | 6893 MB    | 643 MB
idx_btree_on_bool_29        | 6893 MB    | 643 MB
idx_btree_on_bool_30        | 6893 MB    | 643 MB
idx_btree_on_bool_31        | 6893 MB    | 643 MB
idx_btree_on_bool_32        | 6893 MB    | 643 MB
idx_btree_on_composite_bool | 6893 MB    | 1423 MB
idx_bloom_on_bool           | 6893 MB    | 633 MB
idx_bloom_on_int            | 6893 MB    | 633 MB
idx_bloom_on_bit            | 6893 MB    | 633 MB
idx_gin_on_int_arr          | 6893 MB    | 1030 MB
idx_gin_int_on_int          | 6893 MB    | 1030 MB
idx_gin_int_on_bit          | 6893 MB    | 1030 MB

測試sql


所有執行時間結果均取自第二個查詢結果,因為第一個查詢通常需要一些時間將索引載入到記憶體中。

對 int 列進行全掃描

1位過濾(Parallel Seq Scan,執行時間:122930.731 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 c_int & 1 = 1
;

16位濾波(並行序列掃描,122896.131 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 c_int & 255 = 255
 AND (c_int >> 8) & 255 = 0
;

bool 列上的 btree 索引

1位過濾(Seq Scan,執行時間:122853.069 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 c_bool_1 IS TRUE
;

16位濾波(Parallel Seq,執行時間:122834.960 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 c_bool_1 IS TRUE
 AND c_bool_2 IS TRUE
 AND c_bool_3 IS TRUE
 AND c_bool_4 IS TRUE
 AND c_bool_5 IS TRUE
 AND c_bool_6 IS TRUE
 AND c_bool_7 IS TRUE
 AND c_bool_8 IS TRUE
 AND c_bool_9 IS FALSE
 AND c_bool_10 IS FALSE
 AND c_bool_11 IS FALSE
 AND c_bool_12 IS FALSE
 AND c_bool_13 IS FALSE
 AND c_bool_14 IS FALSE
 AND c_bool_15 IS FALSE
 AND c_bool_16 IS FALSE
;

bool 列上的複合 btree 索引

16位過濾(使用idx_btree_on_composite_bool,執行時間:293.317 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 c_bool_32 IN (TRUE, FALSE)
 AND c_bool_31 IN (TRUE, FALSE)
 AND c_bool_30 IN (TRUE, FALSE)
 AND c_bool_29 IN (TRUE, FALSE)
 AND c_bool_28 IN (TRUE, FALSE)
 AND c_bool_27 IN (TRUE, FALSE)
 AND c_bool_26 IN (TRUE, FALSE)
 AND c_bool_25 IN (TRUE, FALSE)
 AND c_bool_24 IN (TRUE, FALSE)
 AND c_bool_23 IN (TRUE, FALSE)
 AND c_bool_22 IN (TRUE, FALSE)
 AND c_bool_21 IN (TRUE, FALSE)
 AND c_bool_20 IN (TRUE, FALSE)
 AND c_bool_19 IN (TRUE, FALSE)
 AND c_bool_18 IN (TRUE, FALSE)
 AND c_bool_17 IN (TRUE, FALSE)
 AND c_bool_16 IS FALSE
 AND c_bool_15 IS FALSE
 AND c_bool_14 IS FALSE
 AND c_bool_13 IS FALSE
 AND c_bool_12 IS FALSE
 AND c_bool_11 IS FALSE
 AND c_bool_10 IS FALSE
 AND c_bool_9 IS FALSE
 AND c_bool_8 IS TRUE
 AND c_bool_7 IS TRUE
 AND c_bool_6 IS TRUE
 AND c_bool_5 IS TRUE
 AND c_bool_4 IS TRUE
 AND c_bool_3 IS TRUE
 AND c_bool_2 IS TRUE
 AND c_bool_1 IS TRUE
;

bool 列上的Bloom索引

1位過濾(Seq Scan,執行時間:122726.850 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 CAST(c_bool_1 AS INTEGER) = 1
;

16位過濾(使用idx_bloom_on_bool,執行時間:373.581 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 CAST(c_bool_1 AS INTEGER) = 1
 AND CAST(c_bool_2 AS INTEGER) = 1
 AND CAST(c_bool_3 AS INTEGER) = 1
 AND CAST(c_bool_4 AS INTEGER) = 1
 AND CAST(c_bool_5 AS INTEGER) = 1
 AND CAST(c_bool_6 AS INTEGER) = 1
 AND CAST(c_bool_7 AS INTEGER) = 1
 AND CAST(c_bool_8 AS INTEGER) = 1
 AND CAST(c_bool_9 AS INTEGER) = 0
 AND CAST(c_bool_10 AS INTEGER) = 0
 AND CAST(c_bool_11 AS INTEGER) = 0
 AND CAST(c_bool_12 AS INTEGER) = 0
 AND CAST(c_bool_13 AS INTEGER) = 0
 AND CAST(c_bool_14 AS INTEGER) = 0
 AND CAST(c_bool_15 AS INTEGER) = 0
 AND CAST(c_bool_16 AS INTEGER) = 0
;

int 列上的Bloom索引

1位過濾(Seq Scan,執行時間:122660.620 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 bitcheck_int(c_int, 1) = 1
;

16位過濾(使用idx_bloom_on_int,執行時間:391.335 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 bitcheck_int(c_int, 1) = 1
 AND bitcheck_int(c_int, 2) = 2
 AND bitcheck_int(c_int, 3) = 3
 AND bitcheck_int(c_int, 4) = 4
 AND bitcheck_int(c_int, 5) = 5
 AND bitcheck_int(c_int, 6) = 6
 AND bitcheck_int(c_int, 7) = 7
 AND bitcheck_int(c_int, 8) = 8
 AND bitcheck_int(c_int, 9) = -9
 AND bitcheck_int(c_int, 10) = -10
 AND bitcheck_int(c_int, 11) = -11
 AND bitcheck_int(c_int, 12) = -12
 AND bitcheck_int(c_int, 13) = -13
 AND bitcheck_int(c_int, 14) = -14
 AND bitcheck_int(c_int, 15) = -15
 AND bitcheck_int(c_int, 16) = -16
;

位列上的Bloom索引

1位過濾(Seq Scan,執行時間:122434.644 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 bitcheck_bit(c_bit, 1) = 1
;

16位過濾(使用idx_bloom_on_bit,執行時間:397.157 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 bitcheck_bit(c_bit, 1) = 1
 AND bitcheck_bit(c_bit, 2) = 2
 AND bitcheck_bit(c_bit, 3) = 3
 AND bitcheck_bit(c_bit, 4) = 4
 AND bitcheck_bit(c_bit, 5) = 5
 AND bitcheck_bit(c_bit, 6) = 6
 AND bitcheck_bit(c_bit, 7) = 7
 AND bitcheck_bit(c_bit, 8) = 8
 AND bitcheck_bit(c_bit, 9) = -9
 AND bitcheck_bit(c_bit, 10) = -10
 AND bitcheck_bit(c_bit, 11) = -11
 AND bitcheck_bit(c_bit, 12) = -12
 AND bitcheck_bit(c_bit, 13) = -13
 AND bitcheck_bit(c_bit, 14) = -14
 AND bitcheck_bit(c_bit, 15) = -15
 AND bitcheck_bit(c_bit, 16) = -16
;

int 數組上的 GIN 索引

1位過濾(使用idx_gin_on_int_arr,執行時間:440942.779 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 c_int_arr @> (ARRAY[1])::_int4
;

16位過濾(使用idx_gin_on_int_arr,執行時間:15322.953 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 c_int_arr @> (ARRAY[1,2,3,4,5,6,7,8,-9,-10,-11,-12,-13,-14,-15,-16])::_int4
;

GIN int int 索引

1位過濾(使用idx_gin_int_on_int,執行時間:489909.621 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 convert_int_to_int_arr(c_int) @> (ARRAY[1])::_int4
;

16位過濾(使用idx_gin_int_on_int,執行時間:15259.772 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 convert_int_to_int_arr(c_int) @> (ARRAY[1,2,3,4,5,6,7,8,-9,-10,-11,-12,-13,-14,-15,-16])::_int4
;

GIN int 位索引

1位過濾(使用idx_gin_int_on_bit,執行時間:506071.625 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 convert_bit_to_int_arr(c_bit) @> (ARRAY[1])::_int4
;

16位過濾(使用idx_gin_int_on_bit,執行時間:15292.945 ms)

EXPLAIN ANALYZE SELECT 1 FROM t_bitwise
WHERE
 convert_bit_to_int_arr(c_bit) @> (ARRAY[1,2,3,4,5,6,7,8,-9,-10,-11,-12,-13,-14,-15,-16])::_int4
;

我的想法

  • Bloom 索引似乎比其他索引更好,因為它的索引大小更小,執行時間也不錯。(但無論如何,在低基數的情況下,任何查詢計劃都會導致全掃描)
  • 您的專欄是 20000 位!我無法測試它。如果要使用bloom index,則必須將其拆分,因為bloom index最多可以有32個普通列或表達式列。

引用自:https://dba.stackexchange.com/questions/198711