求和運算怎麼可能比計數快得多？

December 6, 2018

當我在具有 11.065.763 條記錄的表中進行簡單的基準測試時。我得到以下結果：

select sum(t.amount * t.exchange) from table t;

Finalize Aggregate  (cost=658027.39..658027.40 rows=1 width=8) (actual time=2391.248..2391.248 rows=1 loops=1)
 Buffers: shared hit=25366 read=550786 dirtied=18 written=18
 I/O Timings: read=1978.454 write=0.205
 -&gt;  Gather  (cost=658027.17..658027.38 rows=2 width=8) (actual time=2391.100..2391.229 rows=3 loops=1)
       Workers Planned: 2
       Workers Launched: 2
       Buffers: shared hit=25366 read=550786 dirtied=18 written=18
       I/O Timings: read=1978.454 write=0.205
       -&gt;  Partial Aggregate  (cost=657027.17..657027.18 rows=1 width=8) (actual time=2377.613..2377.613 rows=1 loops=3)
             Buffers: shared hit=24961 read=550759 dirtied=18 written=18
             I/O Timings: read=1977.930 write=0.205
             -&gt;  Parallel Seq Scan on odeme_kaydi ok  (cost=0.00..622181.24 rows=4646124 width=16) (actual time=0.084..1972.061 rows=3688816 loops=3)
                   Buffers: shared hit=24961 read=550759 dirtied=18 written=18
                   I/O Timings: read=1977.930 write=0.205
Planning time: 0.279 ms
Execution time: 2408.745 ms


select count(t.id) from table t;

Aggregate  (cost=489270.14..489270.15 rows=1 width=8) (actual time=12256.560..12256.560 rows=1 loops=1)
 Buffers: shared hit=6902688 read=372054 dirtied=32
 I/O Timings: read=3067.841
 -&gt;  Index Only Scan using pk_odeme_kaydi_id on odeme_kaydi ok  (cost=0.43..461393.25 rows=11150756 width=8) (actual time=0.169..11583.174 rows=11066478 loops=1)
       Heap Fetches: 4085161
       Buffers: shared hit=6902688 read=372054 dirtied=32
       I/O Timings: read=3067.841
Planning time: 0.110 ms
Execution time: 12256.609 ms

注意：表中金額欄位始終大於 0。Postgresql 版本是 9.5。

這是正常的還是有什麼棘手的？

通常sum()，和之間沒有太大區別count()，必須讀取的數據頁數是性能的主要因素，但count()通常更快，尤其是count(*)：
對於絕對性能，SUM 更快還是 COUNT 更快？
你的 sum uses Parallel Seq Scan，結果要快得多。另一方面，計數使用Index Only Scan，這通常是最快的方式，但那裡沒有 Parallel 支持。這可能就是造成差異的原因。（可能是第 9.5 頁的限制，不確定。）
您正在使用 Postgres 9.5。並行查詢是相當新的，並且成本估算不太準確 - 類似的考慮適用於僅索引掃描，但程度較低。因此，計劃者至少做出了一個錯誤的成本估計：489270.15對於使用和僅索引掃描的計數，與658027.40對於總和 - 僅索引掃描可能是不可能的（沒有索引覆蓋amount和exchange）。也許您的成本設置也不能很好地反映現實，這通常涉及錯誤的估計。看：
防止 PostgreSQL 有時選擇錯誤的查詢計劃
而且我看到Heap Fetches: 4085161了計數，對於 11M 行的僅索引掃描，這讓我感到驚訝。不知道為什麼會這樣，也許VACUUM ANALYZE桌子上的 a 可能會改變事情。
除此之外，雖然t.id是定義的NOT NULL，使用**count(*)**而不是count(t.id)，這有點快，因為 Postgres 根本不需要查看儲存的（索引）元組。僅僅存在行就足夠了。
併升級到目前版本的 Postgres，其中任何一個變體都將大大加快。大數據和並行查詢有了重大改進。
順便說一句，如果計數不必精確，還有更快的選擇：
在 PostgreSQL 中發現表的行數的快速方法

引用自：https://dba.stackexchange.com/questions/224289

求和運算怎麼可能比計數快得多？

相關問答

類似查詢的執行性能問題

PostgreSQL - 多列 B-Tree 索引如何與第一列的 order by 和第二列的 IN 查找一起工作？

具有 35m 行慢查詢的 Postgres 表。我怎樣才能提高性能？

更新可被 100 整除的列值

更新“最舊”行

如何提高大銷售表中不同供應商的查詢性能？