Sum() 聚合查詢的 PostgreSQL 9.6 性能問題
我有下表:
id | bigint | not null default nextval('shares_id_seq'::regclass) poolid | text | not null blockheight | bigint | not null networkdifficulty | double precision | not null miner | text | not null worker | text | ipaddress | text | not null created | timestamp without time zone | not null useragent | text | payoutinfo | text | difficulty | double precision | not null default 0 Indexes: "shares_pkey" PRIMARY KEY, btree (id) "idx_shares_pool_block" btree (poolid, blockheight) "idx_shares_pool_created" btree (poolid, created) "idx_shares_pool_miner" btree (poolid, miner) "idx_shares_pool_miner_diff" btree (poolid, miner, difficulty)
我無法理解為什麼以下查詢需要這麼長時間:
explain analyze SELECT SUM(difficulty) FROM shares WHERE poolid = 'xmr1' AND miner = '4BCeEPhodgPMbPWFN1dPwhWXdRX8q4mhhdZdA1dtSMLTLCEYvAj9QXjXAfF7CugEbmfBhgkqHbdgK9b2wKA6nqRZQCgvCDm'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4150712.54..4150712.55 rows=1 width=8) (actual time=25490.101..25490.101 rows=1 loops=1) -> Bitmap Heap Scan on shares (cost=389414.64..4143195.97 rows=3006629 width=8) (actual time=2499.409..24815.011 rows=7445802 loops=1) Recheck Cond: ((poolid = 'xmr1'::text) AND (miner = '4BCeEPhodgPMbPWFN1dPwhWXdRX8q4mhhdZdA1dtSMLTLCEYvAj9QXjXAfF7CugEbmfBhgkqHbdgK9b2wKA6nqRZQCgvCDm'::text)) Rows Removed by Index Recheck: 55232916 Heap Blocks: exact=149273 lossy=2641988 -> Bitmap Index Scan on idx_shares_pool_miner (cost=0.00..388662.98 rows=3006629 width=0) (actual time=2449.977..2449.977 rows=7445802 loops=1) Index Cond: ((poolid = 'xmr1'::text) AND (miner = '4BCeEPhodgPMbPWFN1dPwhWXdRX8q4mhhdZdA1dtSMLTLCEYvAj9QXjXAfF7CugEbmfBhgkqHbdgK9b2wKA6nqRZQCgvCDm'::text)) Planning time: 0.256 ms Execution time: 25490.137 ms (9 rows)
我專門為這種情況創建了索引idx_shares_pool_miner_diff,但它甚至沒有被使用。由於 MVCC,PostgreSQL 不能使用索引嗎?
更新:
按照建議執行
vacuum analyze shares
使 PostgreSQL 使用上述索引:Aggregate (cost=546165.94..546165.95 rows=1 width=8) (actual time=2489.446..2489.447 rows=1 loops=1) -> Index Only Scan using idx_shares_pool_miner_diff on shares (cost=0.69..537874.79 rows=3316458 width=8) (actual time=0.041..1924.396 rows=7461785 loops=1) Index Cond: ((poolid = 'xmr1'::text) AND (miner = '4BCeEPhodgPMbPWFN1dPwhWXdRX8q4mhhdZdA1dtSMLTLCEYvAj9QXjXAfF7CugEbmfBhgkqHbdgK9b2wKA6nqRZQCgvCDm'::text)) Heap Fetches: 16575 Planning time: 0.122 ms Execution time: 2489.477 ms (6 rows)
我認為在不使用並行工作器的情況下,它會盡可能快。
在閱讀了有關 index-only-scans 的文件後,由於共享表上持續的大量寫入活動(> 200 次插入/秒),我似乎受到了性能影響,從而導致了許多 Heap-Fetches。有趣的是,Heap-Fetches 的數量顯然只會增加,不會減少。
如果您的表僅是 INSERT,則有一些方法可以更快地獲得總和。
假設有一列具有單調遞增的值(例如
id
或created
在您的範例中),請創建一個MATERIALZED VIEW
以預先計算早於(最近)給定門檻值的總和。然後只需將最近添加的總和添加到其中:CREATE MATERIALIZED VIEW shares_summed AS SELECT poolid, miner, SUM(difficulty) AS sum_diff FROM shares GROUP BY poolid, miner ORDER BY poolid, miner; -- optional, but to optimize some more WHERE created < '2018-01-01 0:0';
如果 有很多組合
(poolid, miner)
,請在其上添加**UNIQUE
**索引。還需要刷新CONCURRENTLY
,見下文。從該表中獲取一行幾乎沒有成本。然後你只添加最近添加的內容:
SELECT sum(sum_diff) AS total_sum -- takes care of possible missing rows FROM ( SELECT sum_diff FROM shares_summed WHERE poolid = 'xmr1' AND miner = '4BCeEPhod...' UNION ALL SELECT SUM(difficulty) FROM shares WHERE poolid = 'xmr1' AND miner = '4BCeEPhod...' AND created >= '2018-01-01 0:0' ) sub;
您只需不時調整門檻值和
REFRESH
MV。最好使用該**CONCURRENTLY
選項,以避免許多s 的並發問題。**INSERT
考慮:記住值並相應地調整查詢。您可以將其儲存在另一個表中。
為了讓它更快,創建一個分區表並在每個月(或其他)之後添加另一個具有預先計算總和的分區。您可以輕鬆地安排每月一次的 cron 作業來自動執行此操作。然後門檻值始終是月初左右:
SELECT sum(sum_diff) AS total_sum FROM ( SELECT SUM(difficulty) AS sum_diff FROM shares_summed_master -- includes all partitions WHERE poolid = 'xmr1' AND miner = '4BCeEPhod...' UNION ALL SELECT SUM(difficulty) FROM shares WHERE poolid = 'xmr1' AND miner = '4BCeEPhod...' AND created >= date_trunc('month', now()) -- careful, current time zone affects it ) sub;
確保沒有重疊,並且預先計算的總和是最新的。
Postgres 10極大地改進和簡化了表分區。一定要執行最新的版本,Postgres 10.2 中修復了一些極端情況下的錯誤。上面的手冊連結適用於 9.6 版。目前版本在這裡。