Postgresql

我在 EXPLAIN 中錯過了什麼暗示伺服器嚴重負載的內容?

  • May 18, 2021

我查看EXPLAIN ANALYZE了查詢的輸出,發現有幾個子查詢正在掃描整個表。這樣做是為了僅獲取與約會相關的那些表中的最新記錄。

經過一些研究,我決定在這些子查詢上使用橫向連接來顯著減少掃描的數據量是合理的。EXPLAIN ANALYZE建議使用橫向連接的整個查詢的成本約為原始查詢的四分之一。所以我們繼續。

在部署查詢更改的兩個小時內,我們的數據庫伺服器已達到 100% 的最大值,並且基本上沒有響應。將查詢恢復為使用掃描表的子查詢將 CPU 使用率恢復到正常水平。我們的數據庫使用 t2.xlarge 在 AWS RDS for PostgreSQL 中執行。性能洞察顯示 ClientWrite 顯著增加。見數據庫負載圖

使用子查詢的查詢以及EXPLAIN輸出:https ://explain.depesz.com/s/wES6 。

select appointments.*, 
   reportSnapshots.created_at as latestSnapshotTime, 
   responses.created_at as latestResponseTime  
from appointments 
   left join (
       SELECT DISTINCT ON  (appointmentId) created_at, appointmentId
           FROM reportSnapshots
           ORDER BY appointmentId, created_at DESC
   ) reportSnapshots on appointments.id = reportSnapshots.appointmentId 
   left join (
       SELECT DISTINCT ON  (appointmentId) created_at, appointmentId
           FROM responses
           ORDER BY appointmentId, created_at DESC
   ) responses on appointments.id = responses.appointmentId 
where appointments.organizationId = 16 and appointments.locationId = '51' 
   and appointments.cancelled = false and appointments.filteredIn = true 
   and start between '2021-05-04T00:00:00-06:00' and '2021-05-04T23:59:59-06:00' 
   and appointments.locationId in (61,60,140,53,138,130,133,131,55,51,100) 
group by appointments.id, 
   reportSnapshots.created_at, 
   responses.created_at
order by start ASC, start ASC, id ASC 
limit 100

使用橫向連接的查詢以及EXPLAIN輸出:https ://explain.depesz.com/s/B2vp 。

select appointments.*, 
   reportSnapshots.created_at as latestSnapshotTime, 
   responses.created_at as latestResponseTime  
from appointments 
   left join lateral (
       SELECT DISTINCT ON  (appointmentId) created_at, appointmentId
           FROM reportSnapshots
       WHERE reportSnapshots.appointmentId = appointments.id
           ORDER BY appointmentId, created_at DESC
   ) reportSnapshots on appointments.id = reportSnapshots.appointmentId 
   left join lateral (
       SELECT DISTINCT ON  (appointmentId) created_at, appointmentId
           FROM responses
       WHERE responses.appointmentId = appointments.id
           ORDER BY appointmentId, created_at DESC
   ) responses on appointments.id = responses.appointmentId 
where appointments.organizationId = 16 and appointments.locationId = '51' 
   and appointments.cancelled = false and appointments.filteredIn = true 
   and start between '2021-05-04T00:00:00-06:00' and '2021-05-04T23:59:59-06:00' 
   and appointments.locationId in (61,60,140,53,138,130,133,131,55,51,100) 
group by appointments.id, 
   reportSnapshots.created_at, 
   responses.created_at
order by start ASC, start ASC, id ASC 
limit 100

顯然,我不明白EXPLAIN輸出告訴我關於查詢的內容。我錯過了什麼可以告訴我,儘管成本較低,但橫向連接查詢的數據庫負載會更高?

EXPLAIN ANALYZE建議使用橫向連接的整個查詢的成本約為原始查詢的四分之一。

EXPLAIN估計成本)建議40,089.36189,883.92獨角獸點。

但是EXPLAIN ANALYZE測量實際執行時間)不同意並顯示2,502.031ms 與1,835.193ms,因此慢了大約 1/3。估計偏離目標的原因可能有很多。最顯著的成本設置和統計。看:

也就是說,查詢可能會快得多(按數量級)。我們只得到了有限的資訊,但我想你想要這樣的東西:

SELECT a.*
    , rs.created_at AS latestsnapshottime  -- maybe use COALESCE?
    , rp.created_at AS latestresponsetime
FROM   appointments a
LEFT   JOIN LATERAL (
  SELECT rs.created_at
  FROM   reportsnapshots rs
  WHERE  rs.appointmentid = a.id
  ORDER  BY rs.created_at DESC  -- NULLS LAST ?
  LIMIT  1
  ) rs ON true
LEFT   JOIN LATERAL (
  SELECT rp.created_at
  FROM   responses rp
  WHERE  rp.appointmentid = a.id
  ORDER  BY rp.created_at DESC  -- NULLS LAST ?
  LIMIT  1
  ) rp ON true
WHERE  a.organizationid = 16
AND    a.locationid = '51'
AND    a.cancelled = FALSE
AND    a.filteredin = TRUE
AND    a.start BETWEEN '2021-05-04T00:00:00-06:00' AND '2021-05-04T23:59:59-06:00'
AND    a.locationid IN (61,60,140,53,138,130,133,131,55,51,100)
-- GROUP  BY a.id, rs.created_at, rp.created_at  -- not needed, I guess
ORDER  BY a.start, a.id
LIMIT  100;

DISTINCT ON是一個很棒的工具,但適用於不同的情況。看:

像我建議的那樣從每個子查詢中檢索一行LATERAL可以使用索引並且速度非常快。理想情況下,您有這些索引:

reportsnapshots (appointmentid, created_at DESC NULLS LAST)
responses       (appointmentid, created_at DESC NULLS LAST)

進一步閱讀:

如果created_at已定義NOT NULL,則更簡單的索引 on(appointmentid, created_at)也一樣好(並且更可取)。看:

加上外部表上的索引。現在使用的那個(appointments_organizationId_status_start_idx)看起來還不錯。但我懷疑更多的潛力,這取決於未公開的資訊。

您可能不必在外部查詢中進行聚合,因為兩個子查詢每個都返回一行(甚至是您的原始查詢)。

或者,為您的簡單案例使用普通相關子查詢。max()可能更快,但是:

SELECT a.*
    , (SELECT max(rs.created_at)
       FROM   reportsnapshots rs
       WHERE  rs.appointmentid = a.id) AS latestsnapshottime
    , (SELECT max(rp.created_at)
       FROM   responses rp
       WHERE  rp.appointmentid = a.id) AS latestresponsetime
FROM   appointments a
WHERE  ...
LIMIT  100;

看:


除了 1:BETWEEN通常不適合時間戳。看:

旁白2:在您的第一個查詢計劃中,我看到Sort Method: external merge Disk: 3,856kBand Sort Method: external merge Disk: 17,752kB,這表明缺少work_mem. 同樣的問題不會出現在第二個計劃中,我的查詢也不會出現。但請查看您的伺服器配置。有關的:

引用自:https://dba.stackexchange.com/questions/291116