Mysql

大表的錯誤查詢執行計劃

  • February 10, 2018

問題:我們遇到大型表(+40M 行)頻繁更新其索引列且選擇性低的問題。似乎在一段時間後經常更新/刪除列時,優化器開始選擇錯誤的執行計劃。在我看來,問題可能在於提供給優化器(例如 rec_per_key)的過時統計資訊(或沒有)。這是一個例子:

MariaDB [test]> select version();
+----------------------------------+
| version()                        |
+----------------------------------+
| 10.0.33-MariaDB-0ubuntu0.16.04.1 |
+----------------------------------+ 


MariaDB [test]> show create table t1 \G;
   *************************** 1. row ***************************
          Table: t1
   Create Table: CREATE TABLE `t1` (
     `id` int(11) NOT NULL AUTO_INCREMENT,
     `status` enum('a','b','c','d','e') DEFAULT 'a',
     PRIMARY KEY (`id`),
     KEY `status` (`status`)
   ) ENGINE=InnoDB AUTO_INCREMENT=12001 DEFAULT CHARSET=utf8mb4

MariaDB [test]> show index from t1 \G;
*************************** 1. row ***************************
       Table: t1
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 1
 Column_name: id
   Collation: A
 Cardinality: 11980
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
Index_comment: 
*************************** 2. row ***************************
       Table: t1
  Non_unique: 1
    Key_name: status
Seq_in_index: 1
 Column_name: status
   Collation: A
 Cardinality: 15
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: BTREE
     Comment: 
Index_comment:

MariaDB [test]> select index_name,last_update,stat_name,stat_value,sample_size,stat_description from mysql.innodb_index_stats where database_name='test' AND table_name='t1';
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| PRIMARY    | 2018-01-30 18:31:07 | n_diff_pfx01 |      11555 |          20 | id                                |
| PRIMARY    | 2018-01-30 18:31:07 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| PRIMARY    | 2018-01-30 18:31:07 | size         |         21 |        NULL | Number of pages in the index      |
| status     | 2018-01-30 18:31:07 | n_diff_pfx01 |          4 |          10 | status                            |
| status     | 2018-01-30 18:31:07 | n_diff_pfx02 |      11575 |          10 | status,id                         |
| status     | 2018-01-30 18:31:07 | n_leaf_pages |         10 |        NULL | Number of leaf pages in the index |
| status     | 2018-01-30 18:31:07 | size         |         11 |        NULL | Number of pages in the index      |
+------------+---------------------+--------------+------------+-------------+-----------------------------------+

MariaDB [test]> select count(distinct(status)) from t1;
+-------------------------+
| count(distinct(status)) |
+-------------------------+
|                       4 |
+-------------------------+

MariaDB [test]> explain select * from t1 where status='a';
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id   | select_type | table | type | possible_keys | key    | key_len | ref   | rows | Extra                    |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | status        | status | 2       | const | 5038 | Using where; Using index |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+

MariaDB [test]> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 5039  |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

MariaDB [test]> select count(*) from t1 where status='a';
+----------+
| count(*) |
+----------+
|     5039 |
+----------+

MariaDB [test]> update t1 set status='b' where status='a' limit 1000;
Query OK, 1000 rows affected (0.22 sec)
Rows matched: 1000  Changed: 1000  Warnings: 0

MariaDB [test]> select count(*) from t1 where status='a';
+----------+
| count(*) |
+----------+
|     4039 |
+----------+
1 row in set (0.01 sec)

MariaDB [test]> explain select * from t1 where status='a';
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id   | select_type | table | type | possible_keys | key    | key_len | ref   | rows | Extra                    |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | status        | status | 2       | const | 5038 | Using where; Using index |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)

MariaDB [test]> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 4039  |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.01 sec)

MariaDB [test]> select index_name,last_update,stat_name,stat_value,sample_size,stat_description from mysql.innodb_index_stats where database_name='test' AND table_name='t1';
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| PRIMARY    | 2018-01-30 18:52:25 | n_diff_pfx01 |      11980 |          20 | id                                |
| PRIMARY    | 2018-01-30 18:52:25 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| PRIMARY    | 2018-01-30 18:52:25 | size         |         21 |        NULL | Number of pages in the index      |
| status     | 2018-01-30 18:52:25 | n_diff_pfx01 |          4 |          12 | status                            |
| status     | 2018-01-30 18:52:25 | n_diff_pfx02 |      12058 |          12 | status,id                         |
| status     | 2018-01-30 18:52:25 | n_leaf_pages |         12 |        NULL | Number of leaf pages in the index |
| status     | 2018-01-30 18:52:25 | size         |         13 |        NULL | Number of pages in the index      |
+------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)

MariaDB [test]> show index from t1 \G;
*************************** 1. row ***************************
       Table: t1
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 1
 Column_name: id
   Collation: A
 Cardinality: 11980
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
Index_comment: 
*************************** 2. row ***************************
       Table: t1
  Non_unique: 1
    Key_name: status
Seq_in_index: 1
 Column_name: status
   Collation: A
 Cardinality: 15
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: BTREE
     Comment: 
Index_comment:



MariaDB [test]> show variables like 'innodb_stat%';
+--------------------------------------+-------------+
| Variable_name                        | Value       |
+--------------------------------------+-------------+
| innodb_stats_auto_recalc             | ON          |
| innodb_stats_include_delete_marked   | OFF         |
| innodb_stats_method                  | nulls_equal |
| innodb_stats_modified_counter        | 0           |
| innodb_stats_on_metadata             | OFF         |
| innodb_stats_persistent              | ON          |
| innodb_stats_persistent_sample_pages | 20          |
| innodb_stats_sample_pages            | 8           |
| innodb_stats_traditional             | ON          |
| innodb_stats_transient_sample_pages  | 8           |
| innodb_status_output                 | OFF         |
| innodb_status_output_locks           | OFF         |
+--------------------------------------+-------------+

分析表沒有幫助。甚至沒有將 innodb_stats_persistent_sample_pages 更改為更高的值。到目前為止,唯一的解決方案是重建表(alter、mysqldump/recover)或使用/強制索引。

在我們的現實世界場景中,優化器(至少通過解釋顯示)假設它必須檢查 1 600 000 行,而實際上處理程序統計顯示只檢查了 400 行。

任何幫助將非常感激。

編輯- 詳細闡述Rick 的 文章

**1.**關於“基數”:優化器是否假設所有值都同樣頻繁地出現?你的意思是,如果我有一個有 20 行的表和基數為 4 的索引,它希望在每第四行找到相同的值?

MariaDB [test]> select count(*) from t1;
+----------+
| count(*) |
+----------+
|    12000 |
+----------+

MariaDB [test]> select count(*) from t1 where status='a';
+----------+
| count(*) |
+----------+
|     3039 |
+----------+

MariaDB [test]> explain select status from t1 where status='a';
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
| id   | select_type | table | type | possible_keys | key    | key_len | ref   | rows | Extra                    |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | status        | status | 2       | const | 3038 | Using where; Using index |
+------+-------------+-------+------+---------------+--------+---------+-------+------+--------------------------+

Explain顯示檢查的行數為 3038,因此它非常清楚必須檢查多少行。對我來說,它看起來不像假設所有值都同樣頻繁地出現。也許我只是誤解了你的解釋:)

**2.**關於Select使用索引:重點是顯示rows examined索引列頻繁更新時的差異。

**3.**關於Explain輸出:我同意它經常不准確。

**4.**關於poor index choice:我同意,但不幸的是,我們現在無法更改。

5. 關於Select count():重點只是顯示有多少行與 where 子句匹配。

另一個或許更好的例子來說明我們如何受到這個問題的影響。

表格1

MariaDB [test]> show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                               |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `status` enum('a','b','c','d','e') DEFAULT 'a',
 PRIMARY KEY (`id`),
 KEY `status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=12001 DEFAULT CHARSET=utf8mb4 |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

表 2

MariaDB [test]> show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                               |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `status` enum('a','b','c','d','e') DEFAULT 'a',
 PRIMARY KEY (`id`),
 KEY `status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=12001 DEFAULT CHARSET=utf8mb4 |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

表 t1 的索引

MariaDB [test]> select * from mysql.innodb_index_stats where database_name='test' and table_name='t1';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test          | t1         | PRIMARY    | 2018-01-31 12:28:17 | n_diff_pfx01 |      11980 |          20 | id                                |
| test          | t1         | PRIMARY    | 2018-01-31 12:28:17 | n_leaf_pages |         20 |        NULL | Number of leaf pages in the index |
| test          | t1         | PRIMARY    | 2018-01-31 12:28:17 | size         |         21 |        NULL | Number of pages in the index      |
| test          | t1         | status     | 2018-01-31 12:28:17 | n_diff_pfx01 |          4 |           9 | status                            |
| test          | t1         | status     | 2018-01-31 12:28:17 | n_diff_pfx02 |      12000 |           9 | status,id                         |
| test          | t1         | status     | 2018-01-31 12:28:17 | n_leaf_pages |          9 |        NULL | Number of leaf pages in the index |
| test          | t1         | status     | 2018-01-31 12:28:17 | size         |         10 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

***Indexes for table t2***

MariaDB [test]> select * from mysql.innodb_index_stats where database_name='test' and table_name='t2';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test          | t2         | PRIMARY    | 2018-01-31 12:32:54 | n_diff_pfx01 |       4000 |          14 | id                                |
| test          | t2         | PRIMARY    | 2018-01-31 12:32:54 | n_leaf_pages |         14 |        NULL | Number of leaf pages in the index |
| test          | t2         | PRIMARY    | 2018-01-31 12:32:54 | size         |         15 |        NULL | Number of pages in the index      |
| test          | t2         | fk_t1_id   | 2018-01-31 12:32:54 | n_diff_pfx01 |        984 |           5 | t1_id                             |
| test          | t2         | fk_t1_id   | 2018-01-31 12:32:54 | n_diff_pfx02 |       4000 |           5 | t1_id,id                          |
| test          | t2         | fk_t1_id   | 2018-01-31 12:32:54 | n_leaf_pages |          5 |        NULL | Number of leaf pages in the index |
| test          | t2         | fk_t1_id   | 2018-01-31 12:32:54 | size         |          6 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+

t1 中的行數

MariaDB [test]> select count(*) from t1;
+----------+
| count(*) |
+----------+
|    12000 |
+----------+

t2 中的行數

MariaDB [test]> select count(*) from t2;
+----------+
| count(*) |
+----------+
|     4000 |
+----------+

有問題的查詢

MariaDB [test]> explain select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a' \G;
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t2
        type: index
possible_keys: fk_t1_id
         key: fk_t1_id
     key_len: 5
         ref: NULL
        rows: 4000
       Extra: Using where; Using index
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: t1
        type: eq_ref
possible_keys: PRIMARY,status
         key: PRIMARY
     key_len: 4
         ref: test.t2.t1_id
        rows: 1
       Extra: Using where

現在,我們用b值更新 1000 行。之後它應該使用狀態索引,因為檢查 3039 行的成本更低。

MariaDB [test]> update t1 set status='b' where status='a' limit 1000;
Query OK, 1000 rows affected (0.44 sec)
Rows matched: 1000  Changed: 1000  Warnings: 0

MariaDB [test]> select count(*) from t1 where status='a';
+----------+
| count(*) |
+----------+
|     3039 |
+----------+

MariaDB [test]> explain select sql_no_cache status from t1 where status='a' \G;
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t1
        type: ref
possible_keys: status
         key: status
     key_len: 2
         ref: const
        rows: 4038
       Extra: Using where; Using index

MariaDB [test]> explain select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a' \G;
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t2
        type: index
possible_keys: fk_t1_id
         key: fk_t1_id
     key_len: 5
         ref: NULL
        rows: 4000
       Extra: Using where; Using index
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: t1
        type: eq_ref
possible_keys: PRIMARY,status
         key: PRIMARY
     key_len: 4
         ref: test.t2.t1_id
        rows: 1
       Extra: Using where

如您所見,查詢執行計劃似乎是相同的,它仍然假定必須檢查 4000 行。

MariaDB [test]> select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a';
Empty set (0.00 sec)

MariaDB [test]> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 984   |
| Handler_read_last          | 0     |
| Handler_read_next          | 4000  |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

現在讓我們更改表 t1。

MariaDB [test]> alter table t1 engine=innodb;
Query OK, 0 rows affected (1.60 sec)
Records: 0  Duplicates: 0  Warnings: 0

執行計劃發生了變化。

MariaDB [test]> explain select sql_no_cache status from t1 where status='a' \G;
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t1
        type: ref
possible_keys: status
         key: status
     key_len: 2
         ref: const
        rows: 3038
       Extra: Using where; Using index

MariaDB [test]> explain select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a' \G;
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: t1
        type: ref
possible_keys: PRIMARY,status
         key: status
     key_len: 2
         ref: const
        rows: 3038
       Extra: Using where; Using index
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: t2
        type: ref
possible_keys: fk_t1_id
         key: fk_t1_id
     key_len: 5
         ref: test.t1.id
        rows: 2
       Extra: Using index

MariaDB [test]> select sql_no_cache status from t1 join t2 on t1.id = t2.t1_id where status='a' \G;
Empty set (0.01 sec)

MariaDB [test]> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 3040  |
| Handler_read_last          | 0     |
| Handler_read_next          | 3039  |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+

目前(即在實現直方圖之前),MySQL 和 MariaDB 優化器只有一個“基數”指標——也就是說,它們假設所有值都同樣頻繁地出現。最重要ANALYZE的是,統計數據僅接近基數。(直方圖來了。)

您介紹的兩個SELECTs執行“最佳”方式 - 使用索引。

EXPLAIN“行”通常是不准確的。

“狀態”(和其他類似標誌的)列本身很少值得索引。在多列(“複合”)索引中作為第一列可能更有用。status

SELECT COUNT(*)不是提出論點的好例子——除非這確實是您使用的查詢。

(回复已編輯的問題)…

  • RowsinEXPLAIN是對需要讀取的行數的估計。很少是“檢查的行”的確切數量。“檢查的行數”可以在慢日誌中找到。
  • 為了估計基數,InnoDB “潛入” BTree 以進行一些(可配置的)探測。根據數據的分佈方式,這些探測可能會也可能不會反映實際的基數。
  • 的存在INDEX(status)導致您得出一些結論(我認為)。這也導致恰好 3040 出現在幾個輸出中。(或在 1 以內——找到第一行,然後“read_next 3039 次”。)
  • 基數為 4 的 20 行意味著每個值存在大約 5 個(不是 4 個)副本。
  • 基數意味著對於每個可能的值存在相等數量的行,而不是每第 n 個都具有該值。
  • JOINs– 優化器唯一能做的就是假設值的“均勻”分佈。所以,事情會變得非常混亂。

引用自:https://dba.stackexchange.com/questions/196634