Postgresql

類似查詢的執行性能問題

  • April 2, 2022

我在 PostgreSQL 上對一個大約有 300 萬行的表執行一個簡單的查詢時遇到了一些性能問題,並且在一個大約有 120 行的表上進行了連接。

如果我只在較大的表上執行查詢(沒有過濾器),結果將立即返回,但使用連接,結果最多需要 2 分鐘才能返回相同的結果。

這是查詢:

SELECT
   foi.fk_dim_product, count(1)
FROM evino_bi.fact_order_item AS foi
INNER JOIN dim_order_item_status AS dois
   ON dois.id_dim_order_item_status = foi.fk_dim_order_item_status
WHERE dois.is_reserved = '1'
GROUP BY foi.fk_dim_product;

執行 EXPLAIN (ANALYZE, BUFFER),返回以下內容:

HashAggregate  (cost=1379364.80..1379391.01 rows=2621 width=4) (actual time=84822.667..84822.863 rows=630 loops=1)
 Group Key: foi.fk_dim_product
 Buffers: shared hit=181461 read=1061877
 ->  Hash Join  (cost=6.38..1360785.99 rows=3715762 width=4) (actual time=0.180..84764.538 rows=96703 loops=1)
       Hash Cond: (foi.fk_dim_order_item_status = dois.id_dim_order_item_status)
       Buffers: shared hit=181461 read=1061877
       ->  Seq Scan on fact_order_item foi  (cost=0.00..1301725.54 rows=5839054 width=8) (actual time=0.002..81484.109 rows=5837655 loops=1)
             Buffers: shared hit=181458 read=1061877
       ->  Hash  (cost=5.06..5.06 rows=105 width=4) (actual time=0.053..0.053 rows=70 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 11kB
             Buffers: shared hit=3
             ->  Seq Scan on dim_order_item_status dois  (cost=0.00..5.06 rows=105 width=4) (actual time=0.005..0.038 rows=70 loops=1)
                   Filter: (is_reserved = 1)
                   Rows Removed by Filter: 40
                   Buffers: shared hit=3
Planning time: 0.623 ms
Execution time: 84836.100 ms

最讓我困惑的問題是以下查詢以毫秒為單位執行並返回相同的數據:

SELECT
   foi.fk_dim_product, count(1)
FROM evino_bi.fact_order_item foi
INNER JOIN dim_order_item_status dois
 ON dois.id_dim_order_item_status = foi.fk_dim_order_item_status
WHERE dois.is_reserved || '' = '1'
GROUP BY 1

這是第二個語句的 EXPLAIN (ANALYZE, BUFFER):

HashAggregate  (cost=555597.70..555623.91 rows=2621 width=4) (actual time=249.523..249.673 rows=630 loops=1)
 Group Key: foi.fk_dim_product
 Buffers: shared hit=134117
 ->  Nested Loop  (cost=8172.60..555420.76 rows=35388 width=4) (actual time=2.971..219.564 rows=96860 loops=1)
       Buffers: shared hit=134117
       ->  Seq Scan on dim_order_item_status dois  (cost=0.00..6.30 rows=1 width=4) (actual time=0.011..0.101 rows=70 loops=1)
             Filter: (((is_reserved)::text || ''::text) = '1'::text)
             Rows Removed by Filter: 40
             Buffers: shared hit=3
       ->  Bitmap Heap Scan on fact_order_item foi  (cost=8172.60..553329.08 rows=208538 width=8) (actual time=1.205..2.484 rows=1384 loops=70)
             Recheck Cond: (fk_dim_order_item_status = dois.id_dim_order_item_status)
             Heap Blocks: exact=132362
             Buffers: shared hit=134114
             ->  Bitmap Index Scan on fact_order_item_fk_dim_order_item_status  (cost=0.00..8120.47 rows=208538 width=0) (actual time=0.467..0.467 rows=3903 loops=70)
                   Index Cond: (fk_dim_order_item_status = dois.id_dim_order_item_status)
                   Buffers: shared hit=1752
Planning time: 0.691 ms
Execution time: 249.917 ms

那麼,PostgreSQL 沒有充分規劃我的查詢?我可以在我的伺服器上執行任何性能調整,以避免出現第一條語句中的問題並幫助 PostgreSQL 更好地規劃嗎?

另外,為什麼第二個語句執行得快得離譜?

編輯

我更新了添加所需資訊的問題。

編輯 2

這是\d兩個表的。我從 fact_order_item 中隱藏了一些列,因為該表有近 150 列,大多數是簡單的數值。

dim_order_item_status:

   Table "evino_bi.dim_order_item_status"
         Column          |          Type          |                                        Modifiers
--------------------------+------------------------+------------------------------------------------------------------------------------------
id_dim_order_item_status | integer                | not null default nextval('dim_order_item_status_id_dim_order_item_status_seq'::regclass)
src_id_order_item_status | integer                |
name                     | character varying(100) |
name_pt                  | character varying(100) |
is_reserved              | smallint               |
is_problem               | smallint               |
payment_status           | character varying(15)  |
macro_status             | character varying(100) |
macro_status_pt          | character varying(100) |
is_solid                 | smallint               |
Indexes:
   "dim_order_item_status_pkey" PRIMARY KEY, btree (id_dim_order_item_status)
   "src_id_order_item_status_idx" UNIQUE, btree (src_id_order_item_status)

fact_order_item:

                                                          Table "evino_bi.fact_order_item"
                 Column                  |            Type             |                                  Modifiers
------------------------------------------+-----------------------------+------------------------------------------------------------------------------
id_fact_order_item                       | integer                     | not null default nextval('fact_order_item_id_fact_order_item_seq'::regclass)
src_id_order_item                        | integer                     |
src_fk_order                             | integer                     |
order_increment_id                       | character varying(50)       |
order_type                               | character varying(50)       |
is_instant_buy                           | smallint                    |
nfe_number                               | integer                     |
nfe_serie                                | integer                     |
nfe_key                                  | character varying(50)       |
nfe_created_at                           | integer                     |
nfe_created_at_time                      | integer                     |
nf_created_at_datetime                   | timestamp without time zone |
fk_dim_city                              | integer                     |
fk_dim_product                           | integer                     |
fk_dim_product_bundle                    | integer                     |
fk_dim_customer                          | integer                     |
fk_dim_logistics_provider                | integer                     |
fk_dim_channel_last_click                | integer                     |
fk_dim_source_medium_last_click          | integer                     |
fk_dim_content_last_click                | integer                     |
fk_dim_campaign_last_click               | integer                     |
fk_dim_channel_lead                      | integer                     |
fk_dim_source_medium_lead                | integer                     |
fk_dim_content_lead                      | integer                     |
fk_dim_campaign_lead                     | integer                     |
fk_dim_order_item_status                 | integer                     |
fk_dim_payment_method                    | integer                     |
fk_dim_subscription                      | integer                     |
fk_dim_order_volume_status               | integer                     |
fk_dim_order_volume_micro_status         | integer                     |
fk_dim_sales_rule                        | integer                     |
fk_dim_region                            | integer                     |
platform                                 | character varying(40)       |
created_at                               | integer                     |
created_at_time                          | integer                     |
created_at_datetime                      | timestamp without time zone |
updated_at                               | integer                     |
updated_at_time                          | integer                     |
updated_at_datetime                      | timestamp without time zone |
payment_confirmed_at                     | integer                     |
payment_confirmed_at_time                | integer                     |
payment_confirmed_at_datetime            | timestamp without time zone |
cm2                                      | numeric(10,4)               |
etl_updated_at                           | timestamp without time zone |
variations                               | json                        |
Indexes:
   "fact_order_item_pkey" PRIMARY KEY, btree (id_fact_order_item)
   "fk_fact_order_item_src_id_order_item" UNIQUE, btree (src_id_order_item)
   "fact_order_item_fk_dim_campaign_last_click" btree (fk_dim_campaign_last_click)
   "fact_order_item_fk_dim_campaign_lead" btree (fk_dim_campaign_lead)
   "fact_order_item_fk_dim_channel_last_click" btree (fk_dim_channel_last_click)
   "fact_order_item_fk_dim_channel_lead" btree (fk_dim_channel_lead)
   "fact_order_item_fk_dim_city" btree (fk_dim_city)
   "fact_order_item_fk_dim_content_last_click" btree (fk_dim_content_last_click)
   "fact_order_item_fk_dim_content_lead" btree (fk_dim_content_lead)
   "fact_order_item_fk_dim_customer" btree (fk_dim_customer)
   "fact_order_item_fk_dim_logistics_provider" btree (fk_dim_logistics_provider)
   "fact_order_item_fk_dim_order_item_status" btree (fk_dim_order_item_status)
   "fact_order_item_fk_dim_order_volume_status" btree (fk_dim_order_volume_status)
   "fact_order_item_fk_dim_payment_method" btree (fk_dim_payment_method)
   "fact_order_item_fk_dim_product" btree (fk_dim_product)
   "fact_order_item_fk_dim_product_bundle" btree (fk_dim_product_bundle)
   "fact_order_item_fk_dim_region" btree (fk_dim_region)
   "fact_order_item_fk_dim_sales_rule" btree (fk_dim_sales_rule)
   "fact_order_item_fk_dim_source_medium_last_click" btree (fk_dim_source_medium_last_click)
   "fact_order_item_fk_dim_source_medium_lead" btree (fk_dim_source_medium_lead)
   "fact_order_item_fk_dim_subscription" btree (fk_dim_subscription)
   "fk_fact_order_item_created_at" btree (created_at)
   "fk_fact_order_item_delivered_at" btree (delivered_at)
   "fk_fact_order_item_nfe_number_id" btree (nfe_number)
   "fk_fact_order_item_order_increment_id" btree (order_increment_id)
   "fk_fact_order_item_payment_confirmed_at" btree (payment_confirmed_at)
   "fk_fact_order_item_ready_for_picking_at" btree (ready_for_picking_at)
   "fk_fact_order_item_shipped_at" btree (shipped_at)
   "fk_fact_order_item_src_fk_order" btree (src_fk_order)
   "fk_fact_order_item_updated_at" btree (updated_at)

編輯 3

根據要求,這是我的數據庫規格:

Amazon RDS SSD db.m4.xlarge, PostgreSQL 9.5.2, Memory 16 GB, 4 cores.

這些配置應該設置為預設值,因為我沒有調整它們中的任何一個:

cpu_tuple_cost: 0.01
random_page_cost: 4
shared_buffers: 4058056kB
work_mem: 64000kB
effective_cache_size: 8116112kB
MAINTENANCE_WORK_MEM: 259MB

您在該查詢中有兩個 Seq Scans。

  1. 超過dim_order_item_status.is_reserved
  2. 又一個結束foi.fk_dim_order_item_status

這些中的任何一個都有索引嗎?如果沒有,則創建索引和VACUUM ANALYZE表。

其他一些注意事項,

  1. 在 PostgreSQL 中,我們不使用1布爾值。
ALTER TABLE dim_order_item_status
 ALTER COLUMN is_reserved SET DATA TYPE bool;
  1. count(1)最好寫成count(*)

至於為什麼第二個查詢更快,連接正在減慢它的速度,因此它將索引掃描進一步推低。我的假設是您的統計資訊對於fk_dim_order_item_status = dois.id_dim_order_item_status. 通常,如果您執行EXPLAIN ANALYZE而不只是執行,我們可以驗證這一點EXPLAIN。但是無論如何,如果在您創建上述索引後問題仍然存在,請在這些表中執行VACUUM ANALYZE或在這些表上執行。ANALYZE

如果你回來我們需要:

  1. VACUUM ANALYZE對於所有查詢。
  2. \d dim_order_item_status
  3. \d dim_order_item_foi

在更快的計劃中,它低估了 seq 掃描返回的行數 70 倍,它高估了點陣圖掃描返回的行數 50 倍。這兩個估計錯誤大多抵消,留下你有一個很好的查詢。

高估的原因似乎是一個表中 is_reversed 的項目在另一個表中優先很少見。PostgreSQL 的計劃者無法知道這一點。對此沒有令人滿意的解決方案。我會採用你已經做過的方法——故意引入一個錯誤估計來彌補另一個無法修復的錯誤估計。

引用自:https://dba.stackexchange.com/questions/162735