優化大型 postgresql 9.6.5 表上基本選擇計數 (*) 查詢的性能

September 13, 2018

我有一個名為“Links”的python 應用程序，使用者可以在其中添加名為“publicreplies”的社交文章。
這個應用程序看到了不錯的流量 - 包含公共回复的表在過去 12 個月中變得非常大（約 8300 萬行並且還在增加）。
表上的一個基本SELECT查詢links_publicreply顯示在 slow_log 中。它花費的時間超過 500 毫秒，並且比我在大多數其他 postgresql 操作中遇到的慢約 10 倍。
查詢如下：select count(*) from links_publicreply where submitted_on >= current_date - interval '1 day';。基本夠用了。
結果EXPLAIN ANALYZE在這裡https://explain.depesz.com/s/RJ9b
這是輸出\d links_publicreply：
                                     Table "public.links_publicreply"
    Column      |           Type           |                           Modifiers                            
-----------------+--------------------------+----------------------------------------------------------------
id              | integer                  | not null default nextval('links_publicreply_id_seq'::regclass)
submitted_by_id | integer                  | not null
answer_to_id    | integer                  | not null
submitted_on    | timestamp with time zone | not null
description     | text                     | not null
category        | character varying(20)    | not null
seen            | boolean                  | not null
abuse           | boolean                  | not null
device          | character varying(10)    | default '1'::character varying
Indexes:
   "links_publicreply_pkey" PRIMARY KEY, btree (id)
   "links_publicreply_answer_to_id" btree (answer_to_id)
   "links_publicreply_submitted_by_id" btree (submitted_by_id)
Foreign-key constraints:
   "links_publicreply_answer_to_id_fkey" FOREIGN KEY (answer_to_id) REFERENCES links_link(id) DEFERRABLE INITIALLY DEFERRED
   "links_publicreply_submitted_by_id_fkey" FOREIGN KEY (submitted_by_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
   TABLE "links_report" CONSTRAINT "links_report_which_publicreply_id_fkey" FOREIGN KEY (which_publicreply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
   TABLE "links_seen" CONSTRAINT "links_seen_which_reply_id_fkey" FOREIGN KEY (which_reply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
   TABLE "links_link" CONSTRAINT "publicreplyposter_link_fkey" FOREIGN KEY (latest_reply_id) REFERENCES links_publicreply(id) ON UPDATE CASCADE ON DELETE CASCADE
它執行的硬體有 8 個核心和 60 GB 記憶體。postgreql DB 與 Django (python) 應用程序共享這台機器。我一直在監控伺服器的性能，我沒有看到那裡的瓶頸。
有什麼辦法可以提高這個查詢的性能嗎？作為一個偶然的 DBA，很高興能就我在這裡的所有選項（如果有的話）獲得建議。我的總體目標是從所述表中棄用舊行（例如超過 4 個月）。
ps 如果您需要更多資訊來解決此問題，請告訴我

查詢如下：select count(*) from links_publicreply where submitted_on >= current_date - interval '1 day';。基本夠用了。
它還強制進行表掃描，因為我閱讀您的表定義的方式，您在送出的欄位上沒有索引。如果您每天晚上多次執行這種類型的查詢 - 那麼，您知道，索引該欄位可能會有所幫助。
當您談論 RAM 時……您不談論光碟。如果您的 RAM 用完了（可能是配置問題），那麼您可能會遇到所有磁碟中最慢的磁碟…

我在一個有 10 列和 700,000 行的表上遇到了類似的問題（計數（*）……
這是一個簡單的解決方法
with Pool_Count as
(select epc.pool_id, count (pool_id)
from epc
group by epc.pool_id)

select * from pool_count where pool_id &gt;1000
大大縮短了處理時間，因為該子查詢避免了系統搜尋所有列和所有行以計算唯一性

引用自：https://dba.stackexchange.com/questions/217174

優化大型 postgresql 9.6.5 表上基本選擇計數 (*) 查詢的性能

相關問答

優化對 690,000 行表的昂貴的 GROUP BY / ORDER BY 查詢

如何在 PostgreSQL 中使 DISTINCT ON 更快？

基本子查詢評估

pg12 記憶體是否與同一張表上的兩個 SELECT 但不同的欄位一起使用

執行計劃更喜歡連接而不是排序

SORT BY LIMIT LEFT JOIN 查詢慢/查詢計劃錯誤