優化查詢以獲取 x 在另一個表的值對之間的行
PostgreSQL 9.4.5
在我的數據庫中有 2 個表,vcentries和exons。
我想查詢 vcentries 中的行,其中pos位於表外顯子中的位置對列表 exonstart 和 exonend 之間。
查詢結果:select * from exons where exons.genename = ‘NM_001301824’
總查詢執行時間:11 毫秒。檢索到 8 行。
Select pos,chrom,alt,ref from vcfentries where chrom = '1' and pos > 33546679 and pos < 33547159
總查詢執行時間:11 毫秒。檢索到 89 行。
select vcfentries.pos,vcfentries.chrom,vcfentries.alt,vcfentries.ref,exons.exonnumber from exons , vcfentries where vcfentries.pos BETWEEN exons.exonstart and exons.exonend and exons.genename = 'NM_001301824' and exonnumber = 6 and vcfentries.chrom = exons.chrom
總查詢執行時間:72 毫秒。檢索到 137 行。
"Nested Loop (cost=438468.31..2324377.64 rows=809587 width=24)" " -> Index Only Scan using exonspkey on exons (cost=0.42..8.44 rows=1 width=26)" " Index Cond: ((genename = 'NM_001301824'::text) AND (exonnumber = 8))" " -> Bitmap Heap Scan on vcfentries (cost=438467.89..2317823.87 rows=654532 width=16)" " Recheck Cond: ((pos >= exons.exonstart) AND (pos <= exons.exonend) AND (chrom = exons.chrom))" " -> Bitmap Index Scan on vcfentries_pos_chrom_idx (cost=0.00..438304.26 rows=654532 width=0)" " Index Cond: ((pos >= exons.exonstart) AND (pos <= exons.exonend) AND (chrom = exons.chrom))"
select * from exons , vcfentries where vcfentries.pos BETWEEN exons.exonstart and exons.exonend and exons.genename = 'NM_001037501' and vcfentries.chrom = exons.chrom
總查詢執行時間:325389 毫秒。檢索到 2331 行。
"Hash Join (cost=58.73..11528494.14 rows=11334216 width=24)" " Output: vcfentries.pos, vcfentries.chrom, vcfentries.alt, vcfentries.ref, exons.exonnumber" " Hash Cond: (vcfentries.chrom = exons.chrom)" " Join Filter: ((vcfentries.pos >= exons.exonstart) AND (vcfentries.pos <= exons.exonend))" " -> Seq Scan on coeus.vcfentries (cost=0.00..7170736.76 rows=141378976 width=16)" " Output: vcfentries.pos, vcfentries.chrom, vcfentries.alt, vcfentries.ref, vcfentries.analysisid, vcfentries.filter, vcfentries.info_ac, vcfentries.info_af, vcfentries.info_an, vcfentries.info_baseqranksum, vcfentries.info_clippingranksum, vcfentrie (...)" " -> Hash (cost=58.56..58.56 rows=14 width=26)" " Output: exons.exonnumber, exons.exonstart, exons.exonend, exons.chrom" " -> Bitmap Heap Scan on coeus.exons (cost=4.53..58.56 rows=14 width=26)" " Output: exons.exonnumber, exons.exonstart, exons.exonend, exons.chrom" " Recheck Cond: (exons.genename = 'NM_001301824'::text)" " -> Bitmap Index Scan on exons_genename_idx (cost=0.00..4.53 rows=14 width=0)" " Index Cond: (exons.genename = 'NM_001301824'::text)"
外顯子表大小適中,只有大約 400K 行。
vcentries 表很大,有幾億行,但使用索引查詢的速度可以接受。
我為此實現的解決方案是添加一個 plpgsql 函式,該函式執行每個單獨的查詢並將結果輸出為表。
DROP FUNCTION coeus.getexonvcf(text); CREATE OR REPLACE FUNCTION coeus.getexonvcf(text) RETURNS SETOF coeus.exonvcf AS $BODY$ DECLARE r bigint; s coeus.exonvcf; BEGIN FOR r IN SELECT exonnumber FROM coeus.exons where exons.genename = $1 LOOP for s IN select ex.genename,ex.exonnumber,ex.direction,ex.padding, vcf.* from coeus.exons ex, coeus.vcfentries vcf where vcf.pos BETWEEN ex.exonstart and ex.exonend and ex.genename = $1 and ex.exonnumber = r and vcf.chrom = ex.chrom LOOP RETURN NEXT s; END LOOP; END LOOP; RETURN; END $BODY$ LANGUAGE 'plpgsql' ;
總查詢執行時間:1493 毫秒。檢索到 2331 行。
對於具有大量外顯子條目的基因,如 Titin (363),它的速度很慢,但在一般情況下,它的速度可以接受。
我建議在這種情況下使用 CTE。看起來像這樣的東西:
with exons (genename, exonnumber, crom, exonstart, exonend, padding, direction) as (select genename, exonnumber, crom, exonstart, exonend, padding, direction from exons where enename = 'NM_001037501') select * from vcfentries vcf join exons exo using (chrom) where vcf.pos between exo.exonstart and exo.exonend;