可怕的執行查詢，如何解決

October 10, 2019

select
   user_id,
   count(id) as unread_count
from
   notifications
where
   is_read = false
   and user_id in(select(unnest('{200 user IDs}' :: bigint[])))
group by
   user_id;

問題是，這個查詢執行了 1 分鐘，有時比這個時間略長。該表有 32gb 大，並且 user_id 欄位上已經有一個索引。

這是一個執行計劃

HashAggregate  (cost=123354.81..123629.64 rows=27483 width=16) (actual time=90823.880..90823.972 rows=188 loops=1)
 Group Key: user_id
 -&gt;  Nested Loop  (cost=2.32..123217.40 rows=27483 width=16) (actual time=0.184..90752.136 rows=48571 loops=1)
       -&gt;  HashAggregate  (cost=1.76..2.76 rows=100 width=8) (actual time=0.146..0.577 rows=200 loops=1)
             Group Key: unnest(200 user IDs)
             -&gt;  Result  (cost=0.00..0.51 rows=100 width=8) (actual time=0.021..0.073 rows=200 loops=1)
       -&gt;  Index Scan using ix_notification_user_id on notification  (cost=0.56..1229.40 rows=275 width=16) (actual time=119.659..453.533 rows=243 loops=200)
             Index Cond: (200 user IDs)
             Filter: (NOT is_read)
             Rows Removed by Filter: 368
Planning time: 0.189 ms
Execution time: 90824.196 ms

我嘗試了一個使用臨時表的解決方案，將未嵌套的值插入到臨時表中，然後進行比較。但性能根本沒有提高。

我已執行此查詢以查看索引統計資訊：

   schemaname,
   tablename,
   reltuples::bigint,
   relpages::bigint,
   otta,
   round(case when otta = 0 then 0.0 else sml.relpages / otta::numeric end, 1) as tbloat,
   relpages::bigint - otta as wastedpages,
   bs*(sml.relpages-otta)::bigint as wastedbytes,
   pg_size_pretty((bs*(relpages-otta))::bigint) as wastedsize,
   iname,
   ituples::bigint,
   ipages::bigint,
   iotta,
   round(case when iotta = 0 or ipages = 0 then 0.0 else ipages / iotta::numeric end, 1) as ibloat,
   case
       when ipages &lt; iotta then 0
       else ipages::bigint - iotta
   end as wastedipages,
   case
       when ipages &lt; iotta then 0
       else bs*(ipages-iotta)
   end as wastedibytes
   --CASE WHEN ipages &lt; iotta THEN pg_size_pretty(0) ELSE pg_size_pretty((bs*(ipages-iotta))::bigint) END AS wastedisize

   from (
   select
       schemaname,
       tablename,
       cc.reltuples,
       cc.relpages,
       bs,
       ceil((cc.reltuples*((datahdr + ma- (case when datahdr % ma = 0 then ma else datahdr % ma end))+ nullhdr2 + 4))/(bs-20::float)) as otta,
       coalesce(c2.relname, '?') as iname,
       coalesce(c2.reltuples, 0) as ituples,
       coalesce(c2.relpages, 0) as ipages,
       coalesce(ceil((c2.reltuples*(datahdr-12))/(bs-20::float)), 0) as iotta
       -- very rough approximation, assumes all cols

       from (
       select
           ma,
           bs,
           schemaname,
           tablename,
           (datawidth +(hdr + ma-
           (
               case
               when hdr % ma = 0 then ma
               else hdr % ma
           end)))::numeric as datahdr,
           (maxfracsum*(nullhdr + ma-
           (
               case
               when nullhdr % ma = 0 then ma
               else nullhdr % ma
           end))) as nullhdr2
       from
           (
           select
               schemaname,
               tablename,
               hdr,
               ma,
               bs,
               sum((1-null_frac)* avg_width) as datawidth,
               max(null_frac) as maxfracsum,
               hdr +(
               select
                   1 + count(*)/ 8
               from
                   pg_stats s2
               where
                   null_frac &lt;&gt; 0
                   and s2.schemaname = s.schemaname
                   and s2.tablename = s.tablename ) as nullhdr
           from
               pg_stats s,
               (
               select
                   (
                   select
                       current_setting('block_size')::numeric) as bs,
                   case
                       when substring(v, 12, 3) in ('8.0',
                       '8.1',
                       '8.2') then 27
                       else 23
                   end as hdr,
                   case
                       when v ~ 'mingw32' then 8
                       else 4
                   end as ma
               from
                   (
                   select
                       version() as v) as foo ) as constants
           group by
               1,
               2,
               3,
               4,
               5 ) as foo ) as rs
   join pg_class cc on
       cc.relname = rs.tablename
   join pg_namespace nn on
       cc.relnamespace = nn.oid
       and nn.nspname = rs.schemaname
   left join pg_index i on
       indrelid = cc.oid
   left join pg_class c2 on
       c2.oid = i.indexrelid ) as sml
where
   sml.relpages - otta &gt; 0
   or ipages - iotta &gt; 10
order by
   wastedbytes desc,
   wastedibytes desc;

並且 PK 索引和 user_id 索引都有超過 5gbwastedsize和超過 500k+ wastedpages。

我的問題是，有什麼解決方案呢？它純粹是一個需要的索引問題reindex還是我缺少的其他東西？

我不允許更改表的結構，我只需要對其進行優化，以某種方式從 1 分鐘以上到 1 秒以下

在 user_id where is_read = false 上添加部分索引後，查詢時間減少了大約 10-15 秒。但這顯然仍然需要很長時間。

編輯：此表中共有 3250 萬行。執行此查詢：

SELECT t.user_id, COALESCE(unread_count, 0) AS unread_count
FROM   unnest('{200 user_ids}'::bigint[]) t(user_id)
LEFT   JOIN LATERAL (
  SELECT count(*) AS unread_count
  FROM   notification n
  WHERE  n.user_id = t.user_id
  AND    n.is_read = false
  ) sub ON true
;

導致這個執行計劃（很有趣，昨天執行了超過一分鐘，今天執行了大約 30 秒或更短）：

Nested Loop Left Join  (cost=1209.05..120908.50 rows=100 width=16) (actual time=333.088..27260.557 rows=200 loops=1)
 Buffers: shared hit=1981 read=20396 dirtied=7
 I/O Timings: read=27023.896
 -&gt;  Function Scan on unnest t  (cost=0.00..1.00 rows=100 width=8) (actual time=0.022..0.360 rows=200 loops=1)
 -&gt;  Aggregate  (cost=1209.04..1209.05 rows=1 width=8) (actual time=136.292..136.293 rows=1 loops=200)
       Buffers: shared hit=1981 read=20396 dirtied=7
       I/O Timings: read=27023.896
       -&gt;  Index Only Scan using ix_test on notification n  (cost=0.44..1208.29 rows=300 width=0) (actual time=2.153..136.170 rows=105 loops=200)
             Index Cond: (user_id = t.user_id)
             Heap Fetches: 21088
             Buffers: shared hit=1981 read=20396 dirtied=7
             I/O Timings: read=27023.896
Planning time: 0.135 ms
Execution time: 27260.745 ms

您的解釋計劃有點令人困惑，因為看起來索引掃描一次獲取所有 200 個 user_id 的數據，但隨後執行了 200 次。但是做實驗，這不是它正在做的事情，嵌套循環的每次迭代都是從該列表中獲取一個 user_id 的數據，而不是整個列表。所以這只是 EXPLAIN 輸出中的一個展示問題。
如果您set track_io_timing = on執行 EXPLAIN (ANALYZE, BUFFERS)，我相信您會發現大部分時間都花在了從磁碟讀取數據上。讀取隨機分佈在 32 GB 上的 48571 行並不快，除非所有數據都已記憶體在記憶體中，或者數據位於速度極快的 PCIe SSD 上。
你最好的選擇是讓它使用僅索引掃描，而不是投入一些嚴肅的硬體。對於您顯示的查詢，這將需要這樣的索引：
create index on notifications (user_id , is_read, id);
在嘗試之前用吸塵器吸塵。如果它有效，您將需要考慮如何保持表格良好的吸塵，因為預設的 autovac 設置可能不夠。
我不會擔心報告的臃腫。該查詢（您從哪裡得到它？）報告索引中的大量浪費字節，即使是在新重新索引的表上也是如此。此外，它報告的浪費頁面不是完全空的頁面，而是浪費的字節除以頁面大小。這對我來說似乎很愚蠢。

引用自：https://dba.stackexchange.com/questions/250508

可怕的執行查詢，如何解決

相關問答

兩台伺服器上的Postgresql查詢計劃不同

使用 `format()` 動態建構的 `query_int` 的查詢計劃

FROM 子句中的相關函式是否針對每一行執行？

具有大 IN 的 Postgres 查詢，並且在臨時表上加入似乎不起作用

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

可怕的執行查詢，如何解決

相關問答

兩台伺服器上的Postgresql查詢計劃不同

使用 format() 動態建構的 query_int 的查詢計劃

FROM 子句中的相關函式是否針對每一行執行？

具有大 IN 的 Postgres 查詢，並且在臨時表上加入似乎不起作用

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

使用 `format()` 動態建構的 `query_int` 的查詢計劃