可怕的執行查詢,如何解決
select user_id, count(id) as unread_count from notifications where is_read = false and user_id in(select(unnest('{200 user IDs}' :: bigint[]))) group by user_id;
問題是,這個查詢執行了 1 分鐘,有時比這個時間略長。該表有 32gb 大,並且 user_id 欄位上已經有一個索引。
這是一個執行計劃
HashAggregate (cost=123354.81..123629.64 rows=27483 width=16) (actual time=90823.880..90823.972 rows=188 loops=1) Group Key: user_id -> Nested Loop (cost=2.32..123217.40 rows=27483 width=16) (actual time=0.184..90752.136 rows=48571 loops=1) -> HashAggregate (cost=1.76..2.76 rows=100 width=8) (actual time=0.146..0.577 rows=200 loops=1) Group Key: unnest(200 user IDs) -> Result (cost=0.00..0.51 rows=100 width=8) (actual time=0.021..0.073 rows=200 loops=1) -> Index Scan using ix_notification_user_id on notification (cost=0.56..1229.40 rows=275 width=16) (actual time=119.659..453.533 rows=243 loops=200) Index Cond: (200 user IDs) Filter: (NOT is_read) Rows Removed by Filter: 368 Planning time: 0.189 ms Execution time: 90824.196 ms
我嘗試了一個使用臨時表的解決方案,將未嵌套的值插入到臨時表中,然後進行比較。但性能根本沒有提高。
我已執行此查詢以查看索引統計資訊:
schemaname, tablename, reltuples::bigint, relpages::bigint, otta, round(case when otta = 0 then 0.0 else sml.relpages / otta::numeric end, 1) as tbloat, relpages::bigint - otta as wastedpages, bs*(sml.relpages-otta)::bigint as wastedbytes, pg_size_pretty((bs*(relpages-otta))::bigint) as wastedsize, iname, ituples::bigint, ipages::bigint, iotta, round(case when iotta = 0 or ipages = 0 then 0.0 else ipages / iotta::numeric end, 1) as ibloat, case when ipages < iotta then 0 else ipages::bigint - iotta end as wastedipages, case when ipages < iotta then 0 else bs*(ipages-iotta) end as wastedibytes --CASE WHEN ipages < iotta THEN pg_size_pretty(0) ELSE pg_size_pretty((bs*(ipages-iotta))::bigint) END AS wastedisize from ( select schemaname, tablename, cc.reltuples, cc.relpages, bs, ceil((cc.reltuples*((datahdr + ma- (case when datahdr % ma = 0 then ma else datahdr % ma end))+ nullhdr2 + 4))/(bs-20::float)) as otta, coalesce(c2.relname, '?') as iname, coalesce(c2.reltuples, 0) as ituples, coalesce(c2.relpages, 0) as ipages, coalesce(ceil((c2.reltuples*(datahdr-12))/(bs-20::float)), 0) as iotta -- very rough approximation, assumes all cols from ( select ma, bs, schemaname, tablename, (datawidth +(hdr + ma- ( case when hdr % ma = 0 then ma else hdr % ma end)))::numeric as datahdr, (maxfracsum*(nullhdr + ma- ( case when nullhdr % ma = 0 then ma else nullhdr % ma end))) as nullhdr2 from ( select schemaname, tablename, hdr, ma, bs, sum((1-null_frac)* avg_width) as datawidth, max(null_frac) as maxfracsum, hdr +( select 1 + count(*)/ 8 from pg_stats s2 where null_frac <> 0 and s2.schemaname = s.schemaname and s2.tablename = s.tablename ) as nullhdr from pg_stats s, ( select ( select current_setting('block_size')::numeric) as bs, case when substring(v, 12, 3) in ('8.0', '8.1', '8.2') then 27 else 23 end as hdr, case when v ~ 'mingw32' then 8 else 4 end as ma from ( select version() as v) as foo ) as constants group by 1, 2, 3, 4, 5 ) as foo ) as rs join pg_class cc on cc.relname = rs.tablename join pg_namespace nn on cc.relnamespace = nn.oid and nn.nspname = rs.schemaname left join pg_index i on indrelid = cc.oid left join pg_class c2 on c2.oid = i.indexrelid ) as sml where sml.relpages - otta > 0 or ipages - iotta > 10 order by wastedbytes desc, wastedibytes desc;
並且 PK 索引和 user_id 索引都有超過 5gb
wastedsize
和超過 500k+wastedpages
。我的問題是,有什麼解決方案呢?它純粹是一個需要的索引問題
reindex
還是我缺少的其他東西?我不允許更改表的結構,我只需要對其進行優化,以某種方式從 1 分鐘以上到 1 秒以下
在 user_id where is_read = false 上添加部分索引後,查詢時間減少了大約 10-15 秒。但這顯然仍然需要很長時間。
編輯:此表中共有 3250 萬行。執行此查詢:
SELECT t.user_id, COALESCE(unread_count, 0) AS unread_count FROM unnest('{200 user_ids}'::bigint[]) t(user_id) LEFT JOIN LATERAL ( SELECT count(*) AS unread_count FROM notification n WHERE n.user_id = t.user_id AND n.is_read = false ) sub ON true ;
導致這個執行計劃(很有趣,昨天執行了超過一分鐘,今天執行了大約 30 秒或更短):
Nested Loop Left Join (cost=1209.05..120908.50 rows=100 width=16) (actual time=333.088..27260.557 rows=200 loops=1) Buffers: shared hit=1981 read=20396 dirtied=7 I/O Timings: read=27023.896 -> Function Scan on unnest t (cost=0.00..1.00 rows=100 width=8) (actual time=0.022..0.360 rows=200 loops=1) -> Aggregate (cost=1209.04..1209.05 rows=1 width=8) (actual time=136.292..136.293 rows=1 loops=200) Buffers: shared hit=1981 read=20396 dirtied=7 I/O Timings: read=27023.896 -> Index Only Scan using ix_test on notification n (cost=0.44..1208.29 rows=300 width=0) (actual time=2.153..136.170 rows=105 loops=200) Index Cond: (user_id = t.user_id) Heap Fetches: 21088 Buffers: shared hit=1981 read=20396 dirtied=7 I/O Timings: read=27023.896 Planning time: 0.135 ms Execution time: 27260.745 ms
您的解釋計劃有點令人困惑,因為看起來索引掃描一次獲取所有 200 個 user_id 的數據,但隨後執行了 200 次。但是做實驗,這不是它正在做的事情,嵌套循環的每次迭代都是從該列表中獲取一個 user_id 的數據,而不是整個列表。所以這只是 EXPLAIN 輸出中的一個展示問題。
如果您
set track_io_timing = on
執行 EXPLAIN (ANALYZE, BUFFERS),我相信您會發現大部分時間都花在了從磁碟讀取數據上。讀取隨機分佈在 32 GB 上的 48571 行並不快,除非所有數據都已記憶體在記憶體中,或者數據位於速度極快的 PCIe SSD 上。你最好的選擇是讓它使用僅索引掃描,而不是投入一些嚴肅的硬體。對於您顯示的查詢,這將需要這樣的索引:
create index on notifications (user_id , is_read, id);
在嘗試之前用吸塵器吸塵。如果它有效,您將需要考慮如何保持表格良好的吸塵,因為預設的 autovac 設置可能不夠。
我不會擔心報告的臃腫。該查詢(您從哪裡得到它?)報告索引中的大量浪費字節,即使是在新重新索引的表上也是如此。此外,它報告的浪費頁面不是完全空的頁面,而是浪費的字節除以頁面大小。這對我來說似乎很愚蠢。