Mysql

MySQL 難以使用索引優化慢速查詢,但沒有改進。解釋告訴否則

  • February 22, 2016

我正在嘗試優化一個緩慢的查詢(約 11 秒)。但是,即使您以相反的方式說明,索引的使用似乎也沒有任何區別。讓我帶你解決我的問題。

所以這是我的查詢:

SELECT  SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message,
      posts.eng_rate, posts.created_time, posts.mention_or_post
   FROM  posts
   JOIN  stats ON stats.TW_ID = posts.TW_ID
   WHERE  stats.category = 'hotel'
     AND  posts.mention_or_post = 'tweet'
     AND  posts.eng_rate > 1
     AND  posts.in_reply_to = 0
     AND  posts.created_time > 1453161930
   ORDER BY  eng_rate DESC
   LIMIT  0, 1000; 

在查詢上執行配置文件(與強制在語句中使用索引相同的結果)給出結果:

+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| starting             |  0.000063 |
| checking permissions |  0.000091 |
| checking permissions |  0.000803 |
| Opening tables       |  0.000189 |
| System lock          |  0.000052 |
| init                 |  0.000155 |
| optimizing           |  0.000099 |
| statistics           |  0.001616 |
| preparing            |  0.000059 |
| executing            |  0.000046 |
| Sorting result       |  0.000051 |
| Sending data         | 11.443096 |
| end                  |  0.000471 |
| query end            |  0.000278 |
| closing tables       |  0.001001 |
| freeing items        |  0.001249 |
| logging slow query   |  0.000178 |
| logging slow query   |  0.000853 |
| cleaning up          |  0.000245 |
+----------------------+-----------+

MySQL慢查詢日誌顯示:

# Query_time: 11.449220  Lock_time: 0.001186 Rows_sent: 54  Rows_examined: 340139
SET timestamp=1455881044;
SELECT  SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message,
       posts.eng_rate, posts.created_time, posts.mention_or_post
   FROM  posts
   JOIN  stats ON stats.TW_ID = posts.TW_ID
   WHERE  stats.category = 'hotel'
     AND  posts.mention_or_post = 'tweet'
     AND  posts.eng_rate > 1
     AND  posts.in_reply_to = 0
     AND  posts.created_time > 1453161930
   ORDER BY  eng_rate DESC
   LIMIT  0, 1000; 

注意檢查的行,340139

解釋查詢:

+----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-------------+
| id | select_type | table    | type  | possible_keys                      | key           | key_len | ref                        | rows  | Extra       |
+----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-------------+
|  1 | SIMPLE      | posts    | range | c_id,mention_or_post,scraper_index | scraper_index | 36      | NULL                       | 69801 | Using where |
|  1 | SIMPLE      | stats    | ref   | tw_id                              | tw_id         | 8       | bacon.posts.tw_id          |     1 | Using where |
+----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-------------+

帶有“scraper_index”的行 69801。好的,所以看起來 Explain 正在估計,實際結果與檢查的實際行 340139 有很大不同,可能是由於查詢中的連接部分。

索引的結構如何:

+----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+
| Table    | Non_unique | Key_name        | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment                                                                  |
+----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+
| posts    |          0 | PRIMARY         |            1 | p_id            | A         |     4276539 |     NULL | NULL   |      | BTREE      |         |                                                                                |
| posts    |          1 | c_id            |            1 | tw_id           | A         |        1478 |     NULL | NULL   |      | BTREE      |         |                                                                                |
| posts    |          1 | mention_or_post |            1 | mention_or_post | A         |           3 |     NULL | NULL   |      | BTREE      |         |                                                                                |
| posts    |          1 | scraper_index   |            1 | in_reply_to     | A         |      610934 |     NULL | NULL   | YES  | BTREE      |         |                                                                                |
| posts    |          1 | scraper_index   |            2 | mention_or_post | A         |      610934 |     NULL | NULL   |      | BTREE      |         |                                                                                |
| posts    |          1 | scraper_index   |            3 | eng_rate        | A         |      610934 |     NULL | NULL   |      | BTREE      |         |                                                                                |
+----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+

scraper_index 的建構和排序旨在在對查詢執行解釋時獲得最小的行結果。因此,我嘗試使用不同的列和順序,以便通過解釋獲得最佳結果。(例如 created_time 與表中的行數具有相同的肉體)

因此,即使使用索引,查詢也很慢。但是如果我強制我的查詢根本不使用任何索引(對於表文章)怎麼辦?結果:

# Query_time: 12.959047  Lock_time: 0.010025 Rows_sent: 54  Rows_examined: 4385473
SET timestamp=1455883033;
SELECT  SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message,
       posts.eng_rate, posts.created_time, posts.mention_or_post
   FROM  posts IGNORE INDEX (scraper_index, mention_or_post)
   JOIN  stats ON stats.TW_ID = posts.TW_ID
   WHERE  stats.category = 'hotel'
     AND  posts.mention_or_post = 'tweet'
     AND  posts.eng_rate > 1
     AND  posts.in_reply_to = 0
     AND  posts.created_time > 1453161930
   ORDER BY  eng_rate DESC
   LIMIT  0, 1000; 

查詢時間稍慢一點,但現在檢查的行數為 4385473。對我來說,這告訴我索引“scraper_index”效率太低,幾乎和沒有索引一樣慢。

Posts-table 短狀態:MyISAM 引擎,4276539 行,1Gb 數據。Stats-table 短狀態:MyISAM 引擎,1506 行,32Mb 數據。

文章的表結構:

+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| tw_id           | bigint(20)  | NO   | MUL | NULL    |       |
| p_id            | varchar(50) | NO   | PRI | NULL    |       |
| created_time    | int(11)     | NO   |     | NULL    |       |
| message         | text        | NO   |     | NULL    |       |
| type            | varchar(10) | YES  |     | NULL    |       |
| expanded_url    | text        | YES  |     | NULL    |       |
| eng_rate        | int(11)     | NO   |     | NULL    |       |
| in_reply_to     | bigint(20)  | YES  | MUL | NULL    |       |
| user_id         | varchar(20) | YES  |     | NULL    |       |
| mention_or_post | varchar(7)  | NO   | MUL | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+

我的索引為查詢建構錯誤嗎?

- 更新 19/2

根據社區的建議對“scraper_index”進行了更改。將 eng_rate 替換為 created_time。查詢時間縮短約 2-3 秒。索引現在看起來像這樣:

+----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+
| Table    | Non_unique | Key_name        | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment                                                                  |
+----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+
| posts    |          0 | PRIMARY         |            1 | p_id            | A         |     4276539 |     NULL | NULL   |      | BTREE      |         |                                                                                |
| posts    |          1 | c_id            |            1 | tw_id           | A         |        1478 |     NULL | NULL   |      | BTREE      |         |                                                                                |
| posts    |          1 | mention_or_post |            1 | mention_or_post | A         |           3 |     NULL | NULL   |      | BTREE      |         |                                                                                |
| posts    |          1 | scraper_index   |            1 | in_reply_to     | A         |      610934 |     NULL | NULL   | YES  | BTREE      |         |     |
| posts    |          1 | scraper_index   |            2 | mention_or_post | A         |      610934 |     NULL | NULL   |      | BTREE      |         |     |
| posts    |          1 | scraper_index   |            3 | created_time    | A         |     4276539 |     NULL | NULL   |      | BTREE      |         |    
+----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+

慢查詢日誌顯示:

# Query_time: 8.945486  Lock_time: 0.000261 Rows_sent: 54  Rows_examined: 202192
SET timestamp=1455891107;
SELECT  SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message,
       posts.eng_rate, posts.created_time, posts.mention_or_post
   FROM  posts
   JOIN  stats ON stats.TW_ID = posts.TW_ID
   WHERE  stats.category = 'hotel'
     AND  posts.mention_or_post = 'tweet'
     AND  posts.eng_rate > 1
     AND  posts.in_reply_to = 0
     AND  posts.created_time > 1453161930
   ORDER BY  eng_rate DESC; 

用新索引解釋:

+----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-----------------------------+
| id | select_type | table    | type  | possible_keys                      | key           | key_len | ref                        | rows  | Extra                       |
+----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-----------------------------+
|  1 | SIMPLE      | posts    | range | c_id,mention_or_post,scraper_index | scraper_index | 36      | NULL                       | 40152 | Using where; Using filesort |
|  1 | SIMPLE      | stats    | ref   | tw_id                              | tw_id         | 8       | bacon.posts.tw_id          |     1 | Using where                 |
+----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-----------------------------+

- 更新結束

- 22/2 更新

今晚用MySQL慢查詢日誌上面的具體問題查詢執行,發現問題依舊,查詢時間高,9-12秒。但是,預計早期的更改不會有重大改進。

今天,當我分析查詢時,我得到了意想不到的快速查詢結果(不到 0.5 秒),這讓我懷疑它必須對記憶體做一些事情,即使我一直手動使用 SQL_NO_CACHE 完成查詢。也許是某種磁碟記憶體,我不知道,但我無法像以前那樣重現較慢的查詢時間,所以我必須看看這些更改在下次執行的實際情況下如何工作。

根據 Rick James 的建議修改和創建新索引。所以修改後的索引是這樣的:

posts:  INDEX(mention_or_post, in_reply_to, created_time) -- Named scraper_index
stats:  INDEX(TW_ID, category) -- Named tw_id

和新的索引:

stats:  INDEX(category, TW_ID)  -- Named test2
posts:  INDEX(TW_ID, mention_or_post, in_reply_to, created_time) -- Named test1

查詢優化器選擇新索引“test2”和“test1”作為首選索引。解釋告訴:

+----+-------------+----------+------+------------------------------------------+-------+---------+----------------------------------------+------+-----------------------------------------------------------+
| id | select_type | table    | type | possible_keys                            | key   | key_len | ref                                    | rows | Extra                                                     |
+----+-------------+----------+------+------------------------------------------+-------+---------+----------------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | stats    | ref  | tw_id,test2                              | test2 | 92      | const                                  |   44 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | posts    | ref  | c_id,scraper_index,mention_or_post,test1 | test1 | 40      | bacon.stats.TW_ID,const,const          |    7 | Using where                                               |
+----+-------------+----------+------+------------------------------------------+-------+---------+----------------------------------------+------+-----------------------------------------------------------+

這看起來很有希望。

第一次執行查詢:

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000080 |
| checking permissions | 0.000004 |
| checking permissions | 0.000004 |
| Opening tables       | 0.000020 |
| System lock          | 0.000010 |
| init                 | 0.000039 |
| optimizing           | 0.000018 |
| statistics           | 0.000421 |
| preparing            | 0.000119 |
| Creating tmp table   | 0.006901 |
| executing            | 0.000006 |
| Copying to tmp table | 0.008742 |
| Sorting result       | 0.000040 |
| Sending data         | 0.000138 |
| end                  | 0.000005 |
| removing tmp table   | 0.000093 |
| end                  | 0.000004 |
| query end            | 0.000003 |
| closing tables       | 0.000009 |
| freeing items        | 0.000323 |
| logging slow query   | 0.000003 |
| logging slow query   | 0.000025 |
| cleaning up          | 0.000003 |
+----------------------+----------+

# Query_time: 0.016658  Lock_time: 0.000108 Rows_sent: 62  Rows_examined: 2933
SET timestamp=1456160782;
SELECT SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message, posts.eng_rate, posts.created_time, posts.mention_or_post
FROM posts
JOIN stats
ON stats.TW_ID = posts.TW_ID
WHERE stats.category = 'hotel'
AND posts.mention_or_post = 'tweet'
AND posts.eng_rate > 1
AND posts.in_reply_to = 0
AND posts.created_time > 1453161930
ORDER BY eng_rate DESC;

再次執行相同的查詢

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000075 |
| checking permissions | 0.000005 |
| checking permissions | 0.000004 |
| Opening tables       | 0.000021 |
| System lock          | 0.000009 |
| init                 | 0.000038 |
| optimizing           | 0.000018 |
| statistics           | 0.000237 |
| preparing            | 0.000018 |
| Creating tmp table   | 0.000158 |
| executing            | 0.000003 |
| Copying to tmp table | 0.008009 |
| Sorting result       | 0.000040 |
| Sending data         | 0.000108 |
| end                  | 0.000002 |
| removing tmp table   | 0.000131 |
| end                  | 0.000004 |
| query end            | 0.000005 |
| closing tables       | 0.000009 |
| freeing items        | 0.000014 |
| logging slow query   | 0.000002 |
| logging slow query   | 0.000276 |
| cleaning up          | 0.000003 |
+----------------------+----------+

# Query_time: 0.008897  Lock_time: 0.000102 Rows_sent: 62  Rows_examined: 2933
SET timestamp=1456160909;
SELECT SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message, posts.eng_rate, posts.created_time, posts.mention_or_post
FROM posts
JOIN stats
ON stats.TW_ID = posts.TW_ID
WHERE stats.category = 'hotel'
AND posts.mention_or_post = 'tweet'
AND posts.eng_rate > 1
AND posts.in_reply_to = 0
AND posts.created_time > 1453161930
ORDER BY eng_rate DESC;

因此,如果沒有記憶體在胡說八道,這看起來很有希望。我明天必須檢查“真正的”影響是什麼。

- 更新結束


也許我的 MYSQL 伺服器記憶體配置不足以完成這個查詢。簡而言之,我的配置:

key_buffer_size         = 64M
table_open_cache        = 512
sort_buffer_size        = 5M
read_buffer_size        = 5M
read_rnd_buffer_size    = 512K
net_buffer_length       = 8K
myisam_sort_buffer_size = 8M
join_buffer_size        = 1M
thread_cache_size       = 4
max_heap_table_size     = 50M
tmp_table_size          = 50M
query_cache_limit       = 15M
query_cache_size        = 60M

記憶體限制是此查詢的瓶頸嗎? 從 4,3M 潛在行中檢查了 340K 行的結果 - 需要多少記憶體才能使索引盡可能高效?

有沒有人有一些好的建議下一步該做什麼?

為更少的行對錶“posts”進行分區可能是改善查詢時間的解決方案,我可以對錶進行分區,但在該功能中,“新鮮”數據表可能包含數百萬行,所以它只是一個臨時解決方案這是不可取的。

如果最好從以下開始stats

stats:  INDEX(category, TW_ID)  -- "covering"
posts:  INDEX(TW_ID, mention_or_post, in_reply_to, created_time)

萬一create_time是比 更好的過濾器eng_rate

posts:  INDEX(mention_or_post, in_reply_to, created_time)
stats:  INDEX(TW_ID, category)  -- "covering"

對於posts,從用 ‘=’ 比較的列開始(以任何順序),然後移動到一個“範圍”——eng_rate或者created_time

“標誌” ( mention_or_post) 本身幾乎不值得索引。如果它是一個標誌,請使用TINYINTor ENUM,而不是笨重的VARCHAR(7)

“對我來說,這告訴我索引‘scraper_index’效率太低,幾乎和沒有索引一樣慢。” - 不必要。可能是記憶體了一堆索引和/或數據塊,從而使結論無效。將每個查詢執行兩次並進行第二次計時。(並且要麼關閉查詢記憶體,要麼包括SQL_NO_CACHE。)

(請提供SHOW CREATE TABLE,它比 更具描述性DESCRIBE。)

更多關於創建索引。

到底是stats什麼?“1506 行,32Mb 數據”意味著很多東西。

你什麼時候會遷移到 InnoDB。

引用自:https://dba.stackexchange.com/questions/129795