Postgresql

Sum() 聚合查詢的 PostgreSQL 9.6 性能問題

  • February 10, 2018

我有下表:

    id                | bigint                      | not null default nextval('shares_id_seq'::regclass)
    poolid            | text                        | not null
    blockheight       | bigint                      | not null
    networkdifficulty | double precision            | not null
    miner             | text                        | not null
    worker            | text                        |
    ipaddress         | text                        | not null
    created           | timestamp without time zone | not null
    useragent         | text                        |
    payoutinfo        | text                        |
    difficulty        | double precision            | not null default 0

Indexes:
       "shares_pkey" PRIMARY KEY, btree (id)
       "idx_shares_pool_block" btree (poolid, blockheight)
       "idx_shares_pool_created" btree (poolid, created)
       "idx_shares_pool_miner" btree (poolid, miner)
       "idx_shares_pool_miner_diff" btree (poolid, miner, difficulty)

我無法理解為什麼以下查詢需要這麼長時間:

explain analyze SELECT SUM(difficulty) FROM shares WHERE poolid = 'xmr1' AND miner = '4BCeEPhodgPMbPWFN1dPwhWXdRX8q4mhhdZdA1dtSMLTLCEYvAj9QXjXAfF7CugEbmfBhgkqHbdgK9b2wKA6nqRZQCgvCDm';
                                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=4150712.54..4150712.55 rows=1 width=8) (actual time=25490.101..25490.101 rows=1 loops=1)
  ->  Bitmap Heap Scan on shares  (cost=389414.64..4143195.97 rows=3006629 width=8) (actual time=2499.409..24815.011 rows=7445802 loops=1)
        Recheck Cond: ((poolid = 'xmr1'::text) AND (miner = '4BCeEPhodgPMbPWFN1dPwhWXdRX8q4mhhdZdA1dtSMLTLCEYvAj9QXjXAfF7CugEbmfBhgkqHbdgK9b2wKA6nqRZQCgvCDm'::text))
        Rows Removed by Index Recheck: 55232916
        Heap Blocks: exact=149273 lossy=2641988
        ->  Bitmap Index Scan on idx_shares_pool_miner  (cost=0.00..388662.98 rows=3006629 width=0) (actual time=2449.977..2449.977 rows=7445802 loops=1)
              Index Cond: ((poolid = 'xmr1'::text) AND (miner = '4BCeEPhodgPMbPWFN1dPwhWXdRX8q4mhhdZdA1dtSMLTLCEYvAj9QXjXAfF7CugEbmfBhgkqHbdgK9b2wKA6nqRZQCgvCDm'::text))
Planning time: 0.256 ms
Execution time: 25490.137 ms
(9 rows)

我專門為這種情況創建了索引idx_shares_pool_miner_diff,但它甚至沒有被使用。由於 MVCC,PostgreSQL 不能使用索引嗎?

更新:

按照建議執行vacuum analyze shares使 PostgreSQL 使用上述索引:

Aggregate  (cost=546165.94..546165.95 rows=1 width=8) (actual time=2489.446..2489.447 rows=1 loops=1)
  ->  Index Only Scan using idx_shares_pool_miner_diff on shares  (cost=0.69..537874.79 rows=3316458 width=8) (actual time=0.041..1924.396 rows=7461785 loops=1)
        Index Cond: ((poolid = 'xmr1'::text) AND (miner = '4BCeEPhodgPMbPWFN1dPwhWXdRX8q4mhhdZdA1dtSMLTLCEYvAj9QXjXAfF7CugEbmfBhgkqHbdgK9b2wKA6nqRZQCgvCDm'::text))
        Heap Fetches: 16575
Planning time: 0.122 ms
Execution time: 2489.477 ms
(6 rows)

我認為在不使用並行工作器的情況下,它會盡可能快。

在閱讀了有關 index-only-scans 的文件後,由於共享表上持續的大量寫入活動(> 200 次插入/秒),我似乎受到了性能影響,從而導致了許多 Heap-Fetches。有趣的是,Heap-Fetches 的數量顯然只會增加,不會減少。

如果您的表僅是 INSERT,則有一些方法可以更快地獲得總和。

假設有一列具有單調遞增的值(例如idcreated在您的範例中),請創建一個MATERIALZED VIEW以預先計算早於(最近)給定門檻值的總和。然後只需將最近添加的總和添加到其中:

CREATE MATERIALIZED VIEW shares_summed AS
SELECT poolid, miner, SUM(difficulty) AS sum_diff
FROM   shares
GROUP  BY  poolid, miner
ORDER  BY  poolid, miner;  -- optional, but to optimize some more
WHERE  created <  '2018-01-01 0:0';

如果 有很多組合(poolid, miner),請在其上添加**UNIQUE**索引。還需要刷新CONCURRENTLY,見下文。

從該表中獲取一行幾乎沒有成本。然後你只添加最近添加的內容:

SELECT sum(sum_diff) AS total_sum  -- takes care of possible missing rows
FROM  (
  SELECT sum_diff
  FROM   shares_summed
  WHERE  poolid = 'xmr1'
  AND    miner = '4BCeEPhod...'

  UNION ALL
  SELECT SUM(difficulty)
  FROM   shares
  WHERE  poolid = 'xmr1'
  AND    miner  = '4BCeEPhod...'
  AND    created >= '2018-01-01 0:0'
  ) sub;                                                   

您只需不時調整門檻值和REFRESHMV。最好使用該**CONCURRENTLY選項,以避免許多s 的並發問題。**INSERT考慮:

記住值並相應地調整查詢。您可以將其儲存在另一個表中。

為了讓它更快,創建一個分區表並在每個月(或其他)之後添加另一個具有預先計算總和的分區。您可以輕鬆地安排每月一次的 cron 作業來自動執行此操作。然後門檻值始終是月初左右:

SELECT sum(sum_diff) AS total_sum
FROM  (
  SELECT SUM(difficulty) AS sum_diff
  FROM   shares_summed_master  -- includes all partitions
  WHERE  poolid = 'xmr1'
  AND    miner = '4BCeEPhod...'

  UNION ALL
  SELECT SUM(difficulty)
  FROM   shares
  WHERE  poolid = 'xmr1'
  AND    miner  = '4BCeEPhod...'
  AND    created >= date_trunc('month', now())  -- careful, current time zone affects it
  ) sub;           

確保沒有重疊,並且預先計算的總和是最新的。

Postgres 10極大地改進和簡化了表分區。一定要執行最新的版本,Postgres 10.2 中修復了一些極端情況下的錯誤。上面的手冊連結適用於 9.6 版。目前版本在這裡

引用自:https://dba.stackexchange.com/questions/197502