為什麼查詢變得非常慢,與數據量無關?
我有一個查詢通常在幾秒鐘/幾分鐘內執行,一段時間(大約一周)後變得非常慢,然後幾天!執行。查詢只停留在“發送數據”中,CPU 使用率為 100%。伺服器是 Mariadb 10.4,系統正在執行許多其他復雜的查詢而沒有問題,只有這個特定的查詢似乎遇到了一些伺服器限製或性能錯誤。
數據量似乎並不相關,因為查詢在不同的數據庫上執行,這些數據庫是為具有不同記錄量的每個特定項目創建和刪除的,但即使是較小的項目也會出現問題。
重新啟動伺服器會使查詢再次快速執行一段時間,但問題會一遍又一遍地出現。在伺服器達到其允許的最大 RAM 量之前,該問題似乎不會發生,即使伺服器上仍有可用 RAM 可供使用(我專門減小了緩衝區大小以對其進行測試)。一旦問題出現,InnoDB 和 MyISAM 引擎都會發生這種情況。由於在伺服器重新啟動後查詢執行得非常快,因此它似乎不是缺少索引等問題。有什麼提示會導致該行為以及如何調查/解決?
以下是查詢:
CREATE TABLE counts_otus ( _sample_id INT, _region_sample_id INT, sequencesPerOtu INT, PRIMARY KEY (_region_sample_id), INDEX (_sample_id) ) ENGINE=InnoDB AS SELECT _sample_map._sample_id, _sample_map._region_sample_id, ( SELECT COUNT(*) FROM cluster AS otu WHERE otu._cluster_sample_id = _sample_map._region_sample_id ) + ( SELECT count(*) FROM cluster AS otu INNER JOIN cluster AS mem ON otu._region_sample_id = mem._cluster_sample_id WHERE otu._cluster_sample_id = _sample_map._region_sample_id ) + 1 AS sequencesPerOtu FROM Region INNER JOIN _sample_map USING (primaryAccession) INNER JOIN sample USING (_sample_id) WHERE regionTag is NULL AND sampleTag is NULL AND sample_type <> 'otumap' ;
查詢計劃確實不同,這可能是解決問題的決定因素:快速執行時的計劃是
+------+--------------------+-------------+------+-------------------------------------------------+----------------+---------+------------------------------------------------------------------------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------------+------+-------------------------------------------------+----------------+---------+------------------------------------------------------------------------+-------+--------------------------+ | 1 | PRIMARY | sample | ALL | PRIMARY,id_sample_type | NULL | NULL | NULL | 10 | Using where | | 1 | PRIMARY | _sample_map | ref | fk_sset_seqent,fk_sset_sample,fk_sset_smapleTag | fk_sset_sample | 4 | silvangs_slv_main_pid23875_rid26315.sample._sample_id | 52186 | Using where | | 1 | PRIMARY | Region | ref | PRIMARY,fk_rgnTag | fk_rgnTag | 100 | const,silvangs_slv_main_pid23875_rid26315._sample_map.primaryAccession | 1 | Using where; Using index | | 3 | DEPENDENT SUBQUERY | otu | ref | PRIMARY,id_cluster | id_cluster | 4 | silvangs_slv_main_pid23875_rid26315._sample_map._region_sample_id | 1 | Using index | | 3 | DEPENDENT SUBQUERY | mem | ref | id_cluster | id_cluster | 4 | silvangs_slv_main_pid23875_rid26315.otu._region_sample_id | 1 | Using index | | 2 | DEPENDENT SUBQUERY | otu | ref | id_cluster | id_cluster | 4 | silvangs_slv_main_pid23875_rid26315._sample_map._region_sample_id | 1 | Using index | +------+--------------------+-------------+------+-------------------------------------------------+----------------+---------+------------------------------------------------------------------------+-------+--------------------------+
執行速度極慢時的計劃(殺死正在執行的查詢並隨後對其選擇進行解釋:
+------+--------------------+-------------+--------+-------------------------------------------------+----------------+---------+------------------------------------------------------------------------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------------+--------+-------------------------------------------------+----------------+---------+------------------------------------------------------------------------+--------+--------------------------+ | 1 | PRIMARY | sample | ALL | PRIMARY,id_sample_type | NULL | NULL | NULL | 10 | Using where | | 1 | PRIMARY | _sample_map | ref | fk_sset_seqent,fk_sset_sample,fk_sset_smapleTag | fk_sset_sample | 4 | silvangs_slv_main_pid23875_rid26315.sample._sample_id | 41361 | Using where | | 1 | PRIMARY | Region | ref | PRIMARY,fk_rgnTag | fk_rgnTag | 100 | const,silvangs_slv_main_pid23875_rid26315._sample_map.primaryAccession | 1 | Using where; Using index | | 3 | DEPENDENT SUBQUERY | mem | index | id_cluster | id_cluster | 4 | NULL | 738041 | Using index | | 3 | DEPENDENT SUBQUERY | otu | eq_ref | PRIMARY,id_cluster | PRIMARY | 4 | silvangs_slv_main_pid23875_rid26315.mem._cluster_sample_id | 1 | Using where | | 2 | DEPENDENT SUBQUERY | otu | ref | id_cluster | id_cluster | 4 | silvangs_slv_main_pid23875_rid26315._sample_map._region_sample_id | 57226 | Using index | +------+--------------------+-------------+--------+-------------------------------------------------+----------------+---------+------------------------------------------------------------------------+--------+--------------------------+
因此,執行緩慢時不僅有“ref”連接類型,還有“index”和“eq_ref”,據我所知,它們應該會更好,但最終會卡住幾天。
該問題最初發佈到stackoverflow,在那裡我得到建議它更適合在dba,這裡是該問題的連結:https ://stackoverflow.com/questions/60952661/why-does-a-query-becomes-與數據量無關的極慢速度
根據 mysql 文件(https://dev.mysql.com/doc/refman/5.7/en/controlling-query-plan-evaluation.html),錯誤的查詢計劃似乎確實可以產生如此數量級的差異作為秒與天,所以我認為問題在於優化器選擇了錯誤的查詢計劃。為什麼在伺服器執行一段時間後經常發生這種情況(並且其緩衝區的可用記憶體已完全分配)仍然是一個謎,但是解決方案似乎在於向優化器提供提示以避免錯誤的連接順序並使用引用的索引在好的查詢中。這是通過如下更改查詢獲得的:
DROP TABLE IF EXISTS test_counts_otus; CREATE TABLE test_counts_otus ( _sample_id INT, _region_sample_id INT, sequencesPerOtu INT, PRIMARY KEY (_region_sample_id), INDEX (_sample_id) ) ENGINE=InnoDB AS SELECT _sample_map._sample_id, _sample_map._region_sample_id, ( SELECT COUNT(*) FROM cluster AS otu FORCE INDEX ( id_cluster ) WHERE otu._cluster_sample_id = _sample_map._region_sample_id ) + ( SELECT count(*) FROM cluster AS otu FORCE INDEX ( id_cluster ) STRAIGHT_JOIN cluster AS mem FORCE INDEX ( id_cluster ) ON otu._region_sample_id = mem._cluster_sample_id WHERE otu._cluster_sample_id = _sample_map._region_sample_id ) + 1 AS sequencesPerOtu FROM Region INNER JOIN _sample_map USING (primaryAccession) INNER JOIN sample USING (_sample_id) WHERE regionTag is NULL AND sampleTag is NULL AND sample_type <> 'otumap' ;
固定查詢使用強制索引和 STRAIGHT_JOIN,如https://mariadb.com/kb/en/index-hints-how-to-force-query-plans/中所述