Postgresql

Postgres 9.5外表繼承不使用索引

  • May 21, 2016

在 PostgreSQL 9.5.0 中,我有一個按月收集數據的分區表。我嘗試使用PostgreSQL新的外表繼承特性,將一個月的數據推送到另一台PostgreSQL伺服器,這樣我就得到了一張外表。當我從主伺服器執行查詢時,查詢的執行時間比在我擁有外表的伺服器上執行的時間長 7 倍。我沒有通過網路傳遞大量數據,我的查詢如下所示:

explain analyze
SELECT source, global_action, paid, organic, device, count(*) as count, sum(price) as sum
FROM "toys"
WHERE "toys"."container_id" = 857 AND (toys.created_at >= '2015-12-02 05:00:00.000000') AND
(toys.created_at <= '2015-12-30 04:59:59.999999') AND ("toys"."source" IS NOT NULL)
GROUP BY "toys"."source", "toys"."global_action", "toys"."paid", "toys"."organic", "toys"."device";

HashAggregate  (cost=1143634.94..1143649.10 rows=1133 width=15) (actual time=1556.894..1557.017 rows=372 loops=1)
  Group Key: toys.source, toys.global_action, toys.paid, toys.organic, toys.device
  ->  Append  (cost=0.00..1143585.38 rows=2832 width=15) (actual time=113.420..1507.373 rows=76593 loops=1)
        ->  Seq Scan on toys  (cost=0.00..0.00 rows=1 width=242) (actual time=0.001..0.001 rows=0 loops=1)
              Filter: ((source IS NOT NULL) AND (created_at >= '2015-12-02 05:00:00'::timestamp without time zone) AND (created_at <= '2015-12-30 04:59:59.999999'::timestamp without time zone) AND (container_id = 857))
        ->  Foreign Scan on toys_201512_new  (cost=100.00..1143585.38 rows=2831 width=15) (actual time=113.419..1488.445 rows=76593 loops=1)
Planning time: 2.990 ms
Execution time: 1560.131 ms

PostgreSQL 是否在外部表上使用索引?(我在外部表中定義了索引。)如果我直接在該伺服器上執行查詢,則需要 200 毫秒。

這是父表定義:

Table "public.toys"
id bigint
job_reference character varying(100)
container_id integer
user_token character varying(1000)
user_ip character varying(100)
user_zip character varying(10)
user_agent character varying(2000)
url_referrer character varying(2000)
page_url character varying(2000)
source character varying(100)
action integer
created_at timestamp without time zone
cpa numeric(9,3) not null default 0.0
duplicate boolean not null default false
fingerprint character varying(255)
email character varying(1000)
mobile_email_apply boolean
country character varying(255)
country_matched boolean
device integer
organic boolean
job_seeker_id character varying(255)
applicant_status integer
ats_applicant_status character varying(255)
ats_applicant_source character varying(255)
price numeric(9,4)
job_group_id integer
analytic_source character varying(255)
global_action integer
paid_organic integer
paid boolean
meta text
params character varying(2000)
analytic_associated_click_id bigint
external_id character varying(100)
associated_click_id bigint
cpc numeric(9,3)
Indexes:
   "job_stats_master_pkey1" PRIMARY KEY, btree (id)

子表有檢查約束:

"toys_201512_new_created_at_check" CHECK (
   created_at >= '2015-11-30 19:00:00'::timestamp without time zone AND
   created_at <  '2015-12-31 19:00:00'::timestamp without time zone)
Inherits: toys

和索引:

"toys_201512_new_analytic_source" btree (analytic_source)
"toys_201512_new_country" btree (country)
"toys_201512_new_created_at" btree (created_at)
"toys_201512_new_duplicate" btree (duplicate) WHERE duplicate = false
"toys_201512_new_container_id" btree (container_id)
"toys_201512_new_container_id_created_at" btree (container_id, created_at)
"toys_201512_new_fingerprint" btree (fingerprint)
"toys_201512_new_global_action" btree (global_action)
"toys_201512_new_id" btree (id)
"toys_201512_new_job_group_id" btree (job_group_id)
"toys_201512_new_job_reference" btree (job_reference)
"toys_201512_new_on_country_matched" btree (country_matched) WHERE country_matched = true
"toys_201512_new_on_cpa" btree (cpa) WHERE cpa <> 0::numeric
"toys_201512_new_on_duplicate_and_country_matched" btree (duplicate, country_matched) WHERE duplicate = false AND country_matched = true
"toys_201512_new_on_mobile_email_apply" btree (mobile_email_apply) WHERE mobile_email_apply = true
"toys_201512_new_source" btree (source)
"toys_201512_new_user_ip_user_agent" btree (user_ip, user_agent)
"toys_201512_new_user_token" btree (user_token)

**Postgres 可以使用外部伺服器上的索引。**但是與本地表相比,還有很多障礙。閱讀手冊中的遠端查詢優化一章。

目前原始碼中的註釋postgres_fdw.c也很重要:

521 * [...] 對於外國
522 * 表,我們不知道遠端端存在哪些索引,但是
523 * 想推測如果它們存在我們想使用哪些。
...
675 *這裡需要小心一點。有本地人肯定會很好
676 *記憶體有關遠端索引定義的資訊...
...
722 * 對應於正常表的 SeqScan 路徑(雖然取決於什麼
723 *我們能夠發送到遠端的基本限制條件,可能有
724 * 實際上是那裡發生的索引掃描)。 

指數

你的這個索引看起來很適合它:

"toys_201512_new_container_id_created_at" btree (container_id, created_at)

如果您有許多NULL 值,您甚至可以通過附加 將其設為部分索引WHERE source IS NOT NULL,從而使索引對於 Postgres 查詢計劃器來說看起來更好。

查詢計劃的統計資訊

確保查詢計劃器可以使用有效的統計資訊。您的EXPLAIN輸出中的數字顯示非常不匹配:

Toys_201512_new 上的外國掃描(成本=100.00..1143585.38**行=2831**寬度=15)
(實際時間=113.419..1488.445**行=76593**循環=1)

實際返回的行數是 Postgres 預期的 27 倍。手冊:

在外表上執行ANALYZE是更新本地統計資訊的方式;這將執行遠端表的掃描,然後計算和儲存統計資訊,就像該表是本地的一樣。保留本地統計資訊可能是減少遠端表的每次查詢計劃成本的有用方法——但如果遠端表經常更新,本地統計資訊很快就會過時。

由於訪問外部表可能很昂貴/很微妙,因此這不會自動發生。外部表不被 autovacuum 覆蓋。 手冊:

僅在顯式選擇時才分析外部表。

如果遠端表變化很大,您可能需要啟動**use_remote_estimate**. 手冊:

該選項可以為外部表或外部伺服器指定,控制是否postgres_fdw發出遠端EXPLAIN命令以獲得成本估算。外部表的設置會覆蓋其伺服器的任何設置,但僅限於該表。預設值為false.

最後,測試一下實際發送到國外伺服器的是什麼:

可以使用檢查實際發送到遠端伺服器執行的查詢EXPLAIN VERBOSE

詢問

您的查詢整理和格式化,有一個小的改進:

SELECT source, global_action, paid, organic, device
    , count(*) AS count, sum(price) AS sum
FROM   toys
WHERE  container_id = 857
AND    created_at >= '2015-12-02 05:00:00'
AND    created_at <  '2015-12-30 05:00:00'
AND created_at <= '2015-12-30 04:59:59.999999'
AND    source IS NOT NULL
GROUP  BY source, global_action, paid, organic, device;

更簡單,更清潔,也更好地匹配您的CHECK約束,並避免可能的極端情況問題。

引用自:https://dba.stackexchange.com/questions/127535