Postgresql
多個並發“REFRESH MATERIALIZED VIEW”:如何管理?
我有一個包含許多表的大型 Postgres 數據庫,其中一些具有數千萬行。幾個工作程序同時更新數據庫。為了更快地搜尋,相關數據被編譯成物化視圖。
可能有多個並行程序寫入數據庫,然後刷新物化視圖。但是,由於“REFRESH MATERIALIZED VIEW”查詢至少需要幾分鐘時間,因此此類查詢通常會堆積在隊列中,並且它們都一個接一個地執行。
不幸的是,在這種情況下,只有最新的查詢具有任何相關性;所有先前的查詢都徒勞地消耗處理時間來刷新陳舊的數據。有沒有辦法在發出新呼叫時停止已經在執行的對“REFRESH MATERIALIZED VIEW”的呼叫?
請注意,“REFRESH MATERIALIZED VIEW CONCURRENTLY”具有相同的行為,但會大大減慢刷新速度(從幾分鐘到一個小時),因此會加劇性能問題。
當然,可以為每個新查詢測試視圖上的現有鎖,因此很容易取消新查詢;問題是我寧願取消舊的查詢,只保留最新的……
每當表中的數據發生變化時,您似乎都想刷新物化視圖。如果你用物化視圖來做這件事,會花費很長時間,並且更新會互相阻塞和查詢。
也許您可以將自己的“送出刷新”物化視圖建構為表格。
一個簡單的例子:
代替
CREATE MATERIALIZED VIEW sum_eumel AS SELECT eumel_category, sum(eumel_data) AS eumel_sum FROM eumel GROUP BY eumel_category;
你可以這樣做:
BEGIN; CREATE TABLE sum_eumel ( eumel_category text NOT NULL PRIMARY KEY, eumel_sum bigint NOT NULL DEFAULT 0 ); CREATE FUNCTION eumel_trigger() RETURNS trigger LANGUAGE plpgsql AS $$BEGIN IF TG_OP IN ('UPDATE', 'DELETE') THEN /* * Will leave rows with value 0 after the last * row for a category has been deleted. */ UPDATE sum_eumel SET eumel_sum = eumel_sum - OLD.eumel_data WHERE eumel_category = OLD.eumel_category; END IF; IF TG_OP IN ('INSERT', 'UPDATE') THEN INSERT INTO sum_eumel (eumel_category, eumel_sum) VALUES (NEW.eumel_category, NEW.eumel_data) ON CONFLICT TO UPDATE SET eumel_sum = sum_eumel.eumel_sum + EXCLUDED.eumel_data END IF; IF TG_OP = 'TRUNCATE' THEN TRUNCATE sum_eumel; RETURN NULL; END IF; IF TG_OP = 'DELETE' THEN RETURN OLD; ELSE RETURN NEW; END IF; END;$$; CREATE TRIGGER eumel_dml_trig AFTER INSERT OR UPDATE OR DELETE ON eumel FOR EACH ROW EXECUTE PROCEDURE eumel_trigger(); CREATE TRIGGER eumel_truncate_trig AFTER TRUNCATE ON eumel FOR EACH STATEMENT EXECUTE PROCEDURE eumel_trigger(); INSERT INTO sum_eumel SELECT eumel_category, sum(eumel_data) AS eumel_sum FROM eumel GROUP BY eumel_category; COMMIT;