Postgresql

優化查詢以獲取 x 在另一個表的值對之間的行

  • January 27, 2016

PostgreSQL 9.4.5

在我的數據庫中有 2 個表,vcentriesexons

我想查詢 vcentries 中的行,其中pos位於表外顯子中的位置對列表 exonstart 和 exonend 之間。

查詢結果:select * from exons where exons.genename = ‘NM_001301824’

外顯子表

總查詢執行時間:11 毫秒。檢索到 8 行。

這相當於在沒有跨表連接的情況下提取單個外顯子的結果:

Select pos,chrom,alt,ref from vcfentries 
where chrom = '1' 
and pos > 33546679  
and pos < 33547159

總查詢執行時間:11 毫秒。檢索到 89 行。

目前這些是我在桌子上的索引

表索引 表索引

查詢特定外顯子是有效的:

select vcfentries.pos,vcfentries.chrom,vcfentries.alt,vcfentries.ref,exons.exonnumber from exons , vcfentries
where vcfentries.pos BETWEEN  exons.exonstart
and exons.exonend
and exons.genename = 'NM_001301824'
and exonnumber = 6
and vcfentries.chrom = exons.chrom

總查詢執行時間:72 毫秒。檢索到 137 行。

"Nested Loop  (cost=438468.31..2324377.64 rows=809587 width=24)"
"  ->  Index Only Scan using exonspkey on exons  (cost=0.42..8.44 rows=1 width=26)"
"        Index Cond: ((genename = 'NM_001301824'::text) AND (exonnumber = 8))"
"  ->  Bitmap Heap Scan on vcfentries  (cost=438467.89..2317823.87 rows=654532 width=16)"
"        Recheck Cond: ((pos >= exons.exonstart) AND (pos <= exons.exonend) AND (chrom = exons.chrom))"
"        ->  Bitmap Index Scan on vcfentries_pos_chrom_idx  (cost=0.00..438304.26 rows=654532 width=0)"
"              Index Cond: ((pos >= exons.exonstart) AND (pos <= exons.exonend) AND (chrom = exons.chrom))"

當查詢所有這些時,性能會下降。突然,它跳入分鐘範圍:

select * from exons , vcfentries
where vcfentries.pos BETWEEN  exons.exonstart
and exons.exonend
and exons.genename = 'NM_001037501'
and vcfentries.chrom = exons.chrom

總查詢執行時間:325389 毫秒。檢索到 2331 行。

"Hash Join  (cost=58.73..11528494.14 rows=11334216 width=24)"
"  Output: vcfentries.pos, vcfentries.chrom, vcfentries.alt, vcfentries.ref, exons.exonnumber"
"  Hash Cond: (vcfentries.chrom = exons.chrom)"
"  Join Filter: ((vcfentries.pos >= exons.exonstart) AND (vcfentries.pos <= exons.exonend))"
"  ->  Seq Scan on coeus.vcfentries  (cost=0.00..7170736.76 rows=141378976 width=16)"
"        Output: vcfentries.pos, vcfentries.chrom, vcfentries.alt, vcfentries.ref, vcfentries.analysisid, vcfentries.filter, vcfentries.info_ac, vcfentries.info_af, vcfentries.info_an, vcfentries.info_baseqranksum, vcfentries.info_clippingranksum, vcfentrie (...)"
"  ->  Hash  (cost=58.56..58.56 rows=14 width=26)"
"        Output: exons.exonnumber, exons.exonstart, exons.exonend, exons.chrom"
"        ->  Bitmap Heap Scan on coeus.exons  (cost=4.53..58.56 rows=14 width=26)"
"              Output: exons.exonnumber, exons.exonstart, exons.exonend, exons.chrom"
"              Recheck Cond: (exons.genename = 'NM_001301824'::text)"
"              ->  Bitmap Index Scan on exons_genename_idx  (cost=0.00..4.53 rows=14 width=0)"
"                    Index Cond: (exons.genename = 'NM_001301824'::text)"

外顯子表大小適中,只有大約 400K 行。

vcentries 表很大,有幾億行,但使用索引查詢的速度可以接受。

我無法優化此查詢。當我在嘗試使用顯式連接時執行解釋時,我得到了相同的執行計劃。

任何想法為什麼它會創建如此糟糕的執行計劃以及任何建議的修復或更好的查詢?

我為此實現的解決方案是添加一個 plpgsql 函式,該函式執行每個單獨的查詢並將結果輸出為表。

我定義了一個輸出記錄類型並使用了這個函式:

DROP FUNCTION coeus.getexonvcf(text);
CREATE OR REPLACE FUNCTION coeus.getexonvcf(text) RETURNS SETOF coeus.exonvcf AS
$BODY$
DECLARE
   r bigint;
   s coeus.exonvcf;
BEGIN
   FOR r IN SELECT exonnumber FROM coeus.exons where exons.genename = $1
   LOOP
       for s IN select ex.genename,ex.exonnumber,ex.direction,ex.padding, vcf.* from coeus.exons ex, coeus.vcfentries vcf
       where vcf.pos BETWEEN  ex.exonstart and ex.exonend
       and ex.genename = $1
       and ex.exonnumber = r
       and vcf.chrom = ex.chrom
       LOOP
           RETURN NEXT s;
       END LOOP;
   END LOOP;
   RETURN;
END
$BODY$
LANGUAGE 'plpgsql' ;

這避免了任何全表掃描,並在一秒多一點的時間內執行。

總查詢執行時間:1493 毫秒。檢索到 2331 行。

對於具有大量外顯子條目的基因,如 Titin (363),它的速度很慢,但在一般情況下,它的速度可以接受。

我建議在這種情況下使用 CTE。看起來像這樣的東西:

with exons (genename, exonnumber, crom, exonstart, exonend, padding, direction) as
   (select genename, exonnumber, crom, exonstart, exonend, padding, direction
    from exons
    where enename = 'NM_001037501')
select *
from vcfentries vcf
join exons exo
using (chrom)
where vcf.pos between exo.exonstart and exo.exonend;

引用自:https://dba.stackexchange.com/questions/127255