Mysql

為多個 WHERE EXISTS 子查詢創建索引

  • November 24, 2019

我必須找到一種有效的方法來為我正在開發的應用程序編寫一個非常具體的查詢。

想法如下:找出所有基因型值為“1”的每個物種至少存在一個植物系的遺傳標記。

所以我有,比如說,3種植物:“species_1”、“species_2”和“species_3”。

我有這些表:(accessions植物系)、markers(遺傳標記)、genotypes(基因型;每個標記、登錄和數據集一個)、datasets(將基因型值細分為數據集)和taxonomies(植物物種)。

我想出了這個查詢:

SELECT
   markers.*, 1 AS "species_1",
   1 AS "species_2",
   1 AS "species_3"
FROM
   markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
   g.dataset_id = 3
AND EXISTS (
   SELECT
       1
   FROM
       genotypes
   LEFT JOIN accession ON genotypes.accession_id = accession.id
   LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
   WHERE
       genotypes.marker_id = markers.id
   AND genotypes.a = 1
   AND taxonomies.genus = "genus1"
   AND taxonomies.species = "species1"
)
AND EXISTS (
   SELECT
       1
   FROM
       genotypes
   LEFT JOIN accession ON genotypes.accession_id = accession.id
   LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
   WHERE
       genotypes.marker_id = markers.id
   AND genotypes.a = 1
   AND taxonomies.genus = "genus2"
   AND taxonomies.species = "species2"
)
AND EXISTS (
   SELECT
       1
   FROM
       genotypes
   LEFT JOIN accession ON genotypes.accession_id = accession.id
   LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
   WHERE
       genotypes.marker_id = markers.id
   AND genotypes.a = 1
   AND taxonomies.genus = "genus3"
   AND taxonomies.species = "species3"
)

注意:如果您想知道這些SELECT 1 AS位,這僅僅是因為我還執行了對0and的查詢NOT EXISTS。所以這只是一個範例查詢。

genotypes它返回正確的結果,但在大約有 200 萬行的表上速度很慢。

我查看EXPLAIN了查詢並嘗試生成索引,以便每個子查詢都有一個要使用的索引。確實提高了性能的指數是 on genotypes.marker_id, genotypes.dataset_id, genotypes.a

但是,查詢需要大約 7-8 秒才能返回結果,我認為如果我能弄清楚要使用哪個其他索引或如何重組查詢,它可以執行得更快。

有人對如何提高性能有任何建議嗎?


編輯:

CREATE TABLE `genotypes` (
   `id` BIGINT (20) NOT NULL AUTO_INCREMENT,
   `marker_id` INT (11) NOT NULL,
   `dataset_id` INT (11) NOT NULL,
   `accession_id` INT (11) NOT NULL,
   `a` VARCHAR (3) DEFAULT NULL,
   PRIMARY KEY (`id`),
   KEY `marker_id` (`marker_id`) USING BTREE,
   KEY `dataset_id` (`dataset_id`) USING BTREE,
   KEY `accession_id` (`accession_id`) USING BTREE,
   KEY `genotypes_marker_dataset_allele1` (
       `marker_id`,
       `dataset_id`,
       `a`
   ) USING BTREE,
   KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
   CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
   CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
   CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1

CREATE TABLE `markers` (
   `id` INT (11) NOT NULL AUTO_INCREMENT,
   `marker_name` VARCHAR (45) DEFAULT NULL,
   PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1

CREATE TABLE `accessions` (
   `id` INT (11) NOT NULL AUTO_INCREMENT,
   `name` VARCHAR (255) NOT NULL,
   `taxonomy_id` INT (11) DEFAULT NULL,
   PRIMARY KEY (`id`),
   KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
   CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,

) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1

CREATE TABLE `taxonomies` (
   `id` INT (11) NOT NULL AUTO_INCREMENT,
   `genus` VARCHAR (255) NOT NULL DEFAULT '',
   `species` VARCHAR (255) NOT NULL DEFAULT '',
   PRIMARY KEY (`id`),
   KEY `taxonomies_genus` (`genus`) USING BTREE,
   KEY `taxonomies_species` (`species`) USING BTREE,
   KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   PRIMARY markers ALL PRIMARY             8803    Using where; Using temporary
1   PRIMARY g   ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1  genotypes_marker_dataset_allele1    8   markers.id,const    104 Using index; Distinct
6   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where
6   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where
6   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where
5   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where
5   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where
5   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where
4   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where
4   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where
4   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where
3   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where
3   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where
3   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where
2   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where
2   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where
2   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

我會嘗試索引 on(a, marker_id, accession_id)和 on (a, accession_id, marker_id)。由於這a是一個 varchar,所以使用字元串文字,而不是數字:where a = '1'。如果儲存的值只是數字,則將其從 varchar 轉換為適當的數字類型(tinyiny、smallint)。-超立方體

g.dataset_id = 3 正在殺死左邊

試試這個

FROM markers
JOIN genotypes g 
     ON g.marker_id = markers.id
    AND g.dataset_id = 3

where 正在殺死存在中的左連接,因此您可以刪除它

這可能會給您帶來更好的性能

and exists 
(  SELECT 1
  FROM genotypes
  join accession 
        ON genotypes.accession_id = accession.id  
       AND genotypes.a = 1 
       AND genotypes.marker_id = markers.id
  join taxonomies 
        ON taxonomies.id = accession.taxonomy_id
       AND taxonomies.genus   = "genus1"
       AND taxonomies.species = "species1"  
)
and exists 
(  SELECT 1
  FROM genotypes
  join accession 
        ON genotypes.accession_id = accession.id  
       AND genotypes.a = 1 
       AND genotypes.marker_id = markers.id
  join taxonomies 
        ON taxonomies.id = accession.taxonomy_id
       AND taxonomies.genus   = "genus2"
       AND taxonomies.species = "species2"  
)
and exists 
(  SELECT 1
  FROM genotypes
  join accession 
        ON genotypes.accession_id = accession.id  
       AND genotypes.a = 1 
       AND genotypes.marker_id = markers.id
  join taxonomies 
        ON taxonomies.id = accession.taxonomy_id
       AND taxonomies.genus   = "genus3"
       AND taxonomies.species = "species3"  
)

ypercube 是正確的,您不能將所有 3 結合起來

這意味著在同一個 accession.taxonomy_id 上必須相等

SELECT 1
FROM genotypes
join accession 
     ON genotypes.accession_id = accession.id  
    AND genotypes.a = 1 
    AND genotypes.marker_id = markers.id
join taxonomies as t1 
     ON t1.id = accession.taxonomy_id
    AND t1.genus   = "genus1"
    AND t1.species = "species1"
join taxonomies as t2 
     ON t2.id = accession.taxonomy_id
    AND t2.genus   = "genus2"
    AND t2.species = "species2"
join taxonomies as t3 
     ON t3.id = accession.taxonomy_id
    AND t3.genus   = "genus3"
    AND t3.species = "species3"

引用自:https://dba.stackexchange.com/questions/140702