Postgresql

在子查詢連接子句中使用 CASE 查詢在 PostgreSQL 上非常慢,但在 Oracle 和 MariaDB 上卻很快

  • October 14, 2021

我們有一個面向屬性的數據管理系統,具有雙時間性和靈活的租戶優先級邏輯。記錄的刪除是通過在 DELETED 狀態下插入同一行來完成的。租戶可以繼承其他租戶的數據並在自己的租戶中更改/覆蓋它,這不會影響原始數據。可以為每個租戶更改租戶的優先級。這兩個特性意味著我們需要根據這些優先級刪除一些數據,我們通過一個在其連接條件中包含 CASE 語句的子查詢來解決它:

SELECT bd.INSTANCE_ID,
      bd.ATTRIBUTE_ID,
      bd.INSERTED,
      bd.MODIFIED_RECORD,
      bd.EFFECTIVE,
      bd.DISCONTINUE,
      bd.STATUS,
      bd.TENANT,
      bd.VALUE 
FROM MY_DATA bd
 WHERE 1 = 1 
 AND bd.INSERTED <= ?  
 AND bd.INSTANCE_ID IN (SELECT INSTANCE_ID 
                        FROM MY_DATA  
                        WHERE STATUS IN (?) 
                          AND INSERTED <= ?  
                          AND TENANT IN (?, ?)  ) 
 AND ( bd.STATUS <> 'DELETED' )  
 AND bd.ATTRIBUTE_ID IN (?, ?, ?, ?, ?, ?, ?, ?)  
 AND (bd.INSTANCE_ID, bd.ATTRIBUTE_ID, bd.INSERTED) 
     NOT IN (SELECT dc.INSTANCE_ID, dc.ATTRIBUTE_ID, dc.MODIFIED_RECORD 
             FROM MY_DATA dc 
             WHERE dc.STATUS = 'DELETED' 
               AND dc.MODIFIED_RECORD IS NOT NULL 
               AND dc.INSERTED <= ?  
               AND dc.TENANT IN (?, ?)
               AND CASE 
                     WHEN bd.TENANT = 1 THEN 1 
                     WHEN bd.TENANT = 2 THEN 3 
                     ELSE -1 
                   END <= CASE WHEN dc.TENANT = 1 THEN 1 WHEN dc.TENANT = 2 THEN 3 ELSE -1 END
                 ) 
 AND bd.TENANT IN (?, ?)
ORDER BY bd.INSTANCE_ID asc, bd.ATTRIBUTE_ID asc, 
     CASE WHEN bd.TENANT = 1 THEN 1 WHEN bd.TENANT = 2 THEN 3 ELSE -1 END asc, 
     bd.EFFECTIVE asc, 
     bd.INSERTED asc

現在我們有一個包含約 120000 行的測試系統,其中約 300 行處於 DELETED 狀態。所有行都屬於同一個租戶,因此 CASE 語句不是必需的。但是由於我們使用通用查詢建構,它的存在由另一個表中的條目(使用者可定義的租戶優先級)決定。並且由於需要該功能,優化查詢以省略子句只是一種解決方法,而不是解決方案。

當我們在 PostgreSQL 上執行這個查詢時,它非常慢,大約 20 分鐘的執行時間。當我們在 Oracle 或 MariaDB(安裝在同一台機器上)上對相同的數據執行相同的查詢時,只需幾秒鐘。

這是 PostgreSQL EXPLAIN (ANALYZE, BUFFERS) 的結果:

QUERY PLAN
Sort  (cost=133536462.09..133536522.26 rows=24068 width=125) (actual time=825446.289..825453.855 rows=47875 loops=1)
Sort Key: bd.instance_id, bd.attribute_id, (CASE WHEN (bd.tenant = 1) THEN 1 WHEN (bd.tenant = 2) THEN 3 ELSE '-1'::integer END), bd.effective, bd.inserted
Sort Method: external merge  Disk: 4608kB
Buffers: shared hit=87702007, temp read=576 written=577
->  Nested Loop  (cost=4675.23..133534710.56 rows=24068 width=125) (actual time=125.801..825062.707 rows=47875 loops=1)
      Buffers: shared hit=87702004
      ->  HashAggregate  (cost=4674.81..4787.10 rows=11229 width=8) (actual time=100.839..147.466 rows=11192 loops=1)
            Group Key: my_data.instance_id
            Batches: 1  Memory Usage: 1169kB
            Buffers: shared hit=1822
            ->  Seq Scan on my_data  (cost=0.00..4358.54 rows=126506 width=8) (actual time=0.025..69.096 rows=126525 loops=1)
                  Filter: ((tenant = ANY ('{1,2}'::bigint[])) AND ((status)::text = 'RELEASED'::text) AND (inserted <= CURRENT_TIMESTAMP))
                  Rows Removed by Filter: 302
                  Buffers: shared hit=1822
      ->  Index Scan using my_data_pkey on my_data bd  (cost=0.42..11891.49 rows=2 width=121) (actual time=17.175..73.675 rows=4 loops=11192)
            Index Cond: ((instance_id = my_data.instance_id) AND ((attribute_id)::text = ANY ('{code,title,city,country,att1, att2}'::text[])) AND (inserted <= CURRENT_TIMESTAMP))
            Filter: (((status)::text <> 'DELETED'::text) AND (tenant = ANY ('{1,2}'::bigint[])) AND (NOT (SubPlan 1)))
            Rows Removed by Filter: 0
            Buffers: shared hit=87700182
            SubPlan 1
              ->  Seq Scan on my_data dc  (cost=0.00..5943.88 rows=1 width=25) (actual time=0.078..17.028 rows=301 loops=48177)
                    Filter: ((modified_record IS NOT NULL) AND (tenant = ANY ('{1,2}'::bigint[])) AND ((status)::text = 'DELETED'::text) AND (inserted <= CURRENT_TIMESTAMP) AND (CASE WHEN (bd.tenant = 1) THEN 1 WHEN (bd.tenant = 2) THEN 3 ELSE '-1'::integer END <= CASE WHEN (tenant = 1) THEN 1 WHEN (tenant = 2) THEN 3 ELSE '-1'::integer END))
                    Rows Removed by Filter: 126102
                    Buffers: shared hit=87484942
Planning:
Buffers: shared hit=108
Planning Time: 1.098 ms
Execution Time: 825476.536 ms

表和索引定義:

CREATE TABLE "public"."my_data"
(
  instance_id bigint NOT NULL,
  attribute_id varchar(100) NOT NULL,
  inserted timestamp NOT NULL,
  modified_record timestamp,
  effective date NOT NULL,
  discontinue date,
  status varchar(20) NOT NULL,
  tenant bigint NOT NULL,
  value varchar(4000),
  CONSTRAINT my_data_pkey PRIMARY KEY (instance_id,attribute_id,inserted)
)
;
CREATE INDEX my_data_iid ON "public"."my_data"(instance_id)
;
CREATE INDEX my_data_val ON "public"."my_data"
(
 tenant,
 attribute_id,
 value
)
;

我的問題是:

  • 為什麼在 PostgreSQL 上的查詢比在 Oracle 或 MariaDB 上慢得多?
  • 我們可以做些什麼來提高 PostgreSQL 的性能?(因為這是我們要關注的 DBMS)

按照a_horse_with_no_name的建議將 NOT IN 更改為 NOT EXISTS是解決方案:

not in ()Postgres 中的優化不如 Oracle 中的有效。嘗試將其重寫為不存在條件。

我替換了子查詢

AND (bd.INSTANCE_ID, bd.ATTRIBUTE_ID, bd.INSERTED) NOT IN (SELECT ... )

AND NOT exists 
 ( SELECT 1
   FROM my_data dc 
   WHERE dc.STATUS = 'DELETED' 
   AND   dc.INSTANCE_ID = bd.INSTANCE_ID
   AND   dc.ATTRIBUTE_ID = bd.ATTRIBUTE_ID
   AND   dc.MODIFIED_RECORD = bd.INSERTED
   AND   dc.INSERTED <= CURRENT_TIMESTAMP
   AND   dc.TENANT IN (1,2)
   AND   CASE WHEN bd.TENANT = 1 THEN 1 WHEN bd.TENANT = 2 THEN 3 ELSE -1 END <= CASE WHEN dc.TENANT = 1 THEN 1 WHEN dc.TENANT = 2 THEN 3 ELSE -1 END
 )

有了這個改變,現在只有大約 20 分鐘的查詢在幾分之一秒內執行。

在此之前,我嘗試了過濾索引,也是同一使用者建議的:

您可能還想嘗試過濾索引my_data(tenant_id, inserted) where status = 'DELETED' and modified_record is not null以使子查詢NOT IN ()更快

這(不改寫查詢)將時間縮短到約 15 秒,因此絕對需要牢記這一點。但是我在沒有索引的測試階段嘗試了改寫查詢,並且僅 NOT EXISTS 就足以實現大幅加速。

為了盡可能加快內部循環,您可以使用這個奇怪且非常專業的索引:

CREATE INDEX ON my_data (
  (CASE WHEN (tenant = 1) THEN 1 WHEN (tenant = 2) THEN 3 ELSE -1 END),
  inserted
)
WHERE modified_record IS NOT NULL
 AND tenant = ANY ('{1,2}'::bigint[])
 AND status = 'DELETED';

引用自:https://dba.stackexchange.com/questions/301057