為多個 WHERE EXISTS 子查詢創建索引
我必須找到一種有效的方法來為我正在開發的應用程序編寫一個非常具體的查詢。
想法如下:找出所有基因型值為“1”的每個物種至少存在一個植物系的遺傳標記。
所以我有,比如說,3種植物:“species_1”、“species_2”和“species_3”。
我有這些表:(
accessions
植物系)、markers
(遺傳標記)、genotypes
(基因型;每個標記、登錄和數據集一個)、datasets
(將基因型值細分為數據集)和taxonomies
(植物物種)。我想出了這個查詢:
SELECT markers.*, 1 AS "species_1", 1 AS "species_2", 1 AS "species_3" FROM markers LEFT JOIN genotypes g ON g.marker_id = markers.id WHERE g.dataset_id = 3 AND EXISTS ( SELECT 1 FROM genotypes LEFT JOIN accession ON genotypes.accession_id = accession.id LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id WHERE genotypes.marker_id = markers.id AND genotypes.a = 1 AND taxonomies.genus = "genus1" AND taxonomies.species = "species1" ) AND EXISTS ( SELECT 1 FROM genotypes LEFT JOIN accession ON genotypes.accession_id = accession.id LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id WHERE genotypes.marker_id = markers.id AND genotypes.a = 1 AND taxonomies.genus = "genus2" AND taxonomies.species = "species2" ) AND EXISTS ( SELECT 1 FROM genotypes LEFT JOIN accession ON genotypes.accession_id = accession.id LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id WHERE genotypes.marker_id = markers.id AND genotypes.a = 1 AND taxonomies.genus = "genus3" AND taxonomies.species = "species3" )
注意:如果您想知道這些
SELECT 1 AS
位,這僅僅是因為我還執行了對0
and的查詢NOT EXISTS
。所以這只是一個範例查詢。
genotypes
它返回正確的結果,但在大約有 200 萬行的表上速度很慢。我查看
EXPLAIN
了查詢並嘗試生成索引,以便每個子查詢都有一個要使用的索引。確實提高了性能的指數是 ongenotypes.marker_id, genotypes.dataset_id, genotypes.a
。但是,查詢需要大約 7-8 秒才能返回結果,我認為如果我能弄清楚要使用哪個其他索引或如何重組查詢,它可以執行得更快。
有人對如何提高性能有任何建議嗎?
編輯:
CREATE TABLE `genotypes` ( `id` BIGINT (20) NOT NULL AUTO_INCREMENT, `marker_id` INT (11) NOT NULL, `dataset_id` INT (11) NOT NULL, `accession_id` INT (11) NOT NULL, `a` VARCHAR (3) DEFAULT NULL, PRIMARY KEY (`id`), KEY `marker_id` (`marker_id`) USING BTREE, KEY `dataset_id` (`dataset_id`) USING BTREE, KEY `accession_id` (`accession_id`) USING BTREE, KEY `genotypes_marker_dataset_allele1` ( `marker_id`, `dataset_id`, `a` ) USING BTREE, KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE, CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE, CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE, CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1 CREATE TABLE `markers` ( `id` INT (11) NOT NULL AUTO_INCREMENT, `marker_name` VARCHAR (45) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1 CREATE TABLE `accessions` ( `id` INT (11) NOT NULL AUTO_INCREMENT, `name` VARCHAR (255) NOT NULL, `taxonomy_id` INT (11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `taxonomy_id` (`taxonomy_id`) USING BTREE, CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE, ) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1 CREATE TABLE `taxonomies` ( `id` INT (11) NOT NULL AUTO_INCREMENT, `genus` VARCHAR (255) NOT NULL DEFAULT '', `species` VARCHAR (255) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `taxonomies_genus` (`genus`) USING BTREE, KEY `taxonomies_species` (`species`) USING BTREE, KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE ) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1 id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary 1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct 6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where 6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where 6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where 5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where 5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where 5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where 4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where 4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where 4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where 3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where 3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where 3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where 2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where 2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where 2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
我會嘗試索引 on
(a, marker_id, accession_id)
和 on(a, accession_id, marker_id)
。由於這a
是一個 varchar,所以使用字元串文字,而不是數字:where a = '1'
。如果儲存的值只是數字,則將其從 varchar 轉換為適當的數字類型(tinyiny、smallint)。-超立方體
g.dataset_id = 3 正在殺死左邊
試試這個
FROM markers JOIN genotypes g ON g.marker_id = markers.id AND g.dataset_id = 3
where 正在殺死存在中的左連接,因此您可以刪除它
這可能會給您帶來更好的性能
and exists ( SELECT 1 FROM genotypes join accession ON genotypes.accession_id = accession.id AND genotypes.a = 1 AND genotypes.marker_id = markers.id join taxonomies ON taxonomies.id = accession.taxonomy_id AND taxonomies.genus = "genus1" AND taxonomies.species = "species1" ) and exists ( SELECT 1 FROM genotypes join accession ON genotypes.accession_id = accession.id AND genotypes.a = 1 AND genotypes.marker_id = markers.id join taxonomies ON taxonomies.id = accession.taxonomy_id AND taxonomies.genus = "genus2" AND taxonomies.species = "species2" ) and exists ( SELECT 1 FROM genotypes join accession ON genotypes.accession_id = accession.id AND genotypes.a = 1 AND genotypes.marker_id = markers.id join taxonomies ON taxonomies.id = accession.taxonomy_id AND taxonomies.genus = "genus3" AND taxonomies.species = "species3" )
ypercube 是正確的,您不能將所有 3 結合起來
這意味著在同一個 accession.taxonomy_id 上必須相等
SELECT 1 FROM genotypes join accession ON genotypes.accession_id = accession.id AND genotypes.a = 1 AND genotypes.marker_id = markers.id join taxonomies as t1 ON t1.id = accession.taxonomy_id AND t1.genus = "genus1" AND t1.species = "species1" join taxonomies as t2 ON t2.id = accession.taxonomy_id AND t2.genus = "genus2" AND t2.species = "species2" join taxonomies as t3 ON t3.id = accession.taxonomy_id AND t3.genus = "genus3" AND t3.species = "species3"