MySQL 難以使用索引優化慢速查詢,但沒有改進。解釋告訴否則
我正在嘗試優化一個緩慢的查詢(約 11 秒)。但是,即使您以相反的方式說明,索引的使用似乎也沒有任何區別。讓我帶你解決我的問題。
所以這是我的查詢:
SELECT SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message, posts.eng_rate, posts.created_time, posts.mention_or_post FROM posts JOIN stats ON stats.TW_ID = posts.TW_ID WHERE stats.category = 'hotel' AND posts.mention_or_post = 'tweet' AND posts.eng_rate > 1 AND posts.in_reply_to = 0 AND posts.created_time > 1453161930 ORDER BY eng_rate DESC LIMIT 0, 1000;
在查詢上執行配置文件(與強制在語句中使用索引相同的結果)給出結果:
+----------------------+-----------+ | Status | Duration | +----------------------+-----------+ | starting | 0.000063 | | checking permissions | 0.000091 | | checking permissions | 0.000803 | | Opening tables | 0.000189 | | System lock | 0.000052 | | init | 0.000155 | | optimizing | 0.000099 | | statistics | 0.001616 | | preparing | 0.000059 | | executing | 0.000046 | | Sorting result | 0.000051 | | Sending data | 11.443096 | | end | 0.000471 | | query end | 0.000278 | | closing tables | 0.001001 | | freeing items | 0.001249 | | logging slow query | 0.000178 | | logging slow query | 0.000853 | | cleaning up | 0.000245 | +----------------------+-----------+
MySQL慢查詢日誌顯示:
# Query_time: 11.449220 Lock_time: 0.001186 Rows_sent: 54 Rows_examined: 340139 SET timestamp=1455881044; SELECT SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message, posts.eng_rate, posts.created_time, posts.mention_or_post FROM posts JOIN stats ON stats.TW_ID = posts.TW_ID WHERE stats.category = 'hotel' AND posts.mention_or_post = 'tweet' AND posts.eng_rate > 1 AND posts.in_reply_to = 0 AND posts.created_time > 1453161930 ORDER BY eng_rate DESC LIMIT 0, 1000;
注意檢查的行,340139
解釋查詢:
+----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-------------+ | 1 | SIMPLE | posts | range | c_id,mention_or_post,scraper_index | scraper_index | 36 | NULL | 69801 | Using where | | 1 | SIMPLE | stats | ref | tw_id | tw_id | 8 | bacon.posts.tw_id | 1 | Using where | +----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-------------+
帶有“scraper_index”的行 69801。好的,所以看起來 Explain 正在估計,實際結果與檢查的實際行 340139 有很大不同,可能是由於查詢中的連接部分。
索引的結構如何:
+----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+ | posts | 0 | PRIMARY | 1 | p_id | A | 4276539 | NULL | NULL | | BTREE | | | | posts | 1 | c_id | 1 | tw_id | A | 1478 | NULL | NULL | | BTREE | | | | posts | 1 | mention_or_post | 1 | mention_or_post | A | 3 | NULL | NULL | | BTREE | | | | posts | 1 | scraper_index | 1 | in_reply_to | A | 610934 | NULL | NULL | YES | BTREE | | | | posts | 1 | scraper_index | 2 | mention_or_post | A | 610934 | NULL | NULL | | BTREE | | | | posts | 1 | scraper_index | 3 | eng_rate | A | 610934 | NULL | NULL | | BTREE | | | +----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+
scraper_index 的建構和排序旨在在對查詢執行解釋時獲得最小的行結果。因此,我嘗試使用不同的列和順序,以便通過解釋獲得最佳結果。(例如 created_time 與表中的行數具有相同的肉體)
因此,即使使用索引,查詢也很慢。但是如果我強制我的查詢根本不使用任何索引(對於表文章)怎麼辦?結果:
# Query_time: 12.959047 Lock_time: 0.010025 Rows_sent: 54 Rows_examined: 4385473 SET timestamp=1455883033; SELECT SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message, posts.eng_rate, posts.created_time, posts.mention_or_post FROM posts IGNORE INDEX (scraper_index, mention_or_post) JOIN stats ON stats.TW_ID = posts.TW_ID WHERE stats.category = 'hotel' AND posts.mention_or_post = 'tweet' AND posts.eng_rate > 1 AND posts.in_reply_to = 0 AND posts.created_time > 1453161930 ORDER BY eng_rate DESC LIMIT 0, 1000;
查詢時間稍慢一點,但現在檢查的行數為 4385473。對我來說,這告訴我索引“scraper_index”效率太低,幾乎和沒有索引一樣慢。
Posts-table 短狀態:MyISAM 引擎,4276539 行,1Gb 數據。Stats-table 短狀態:MyISAM 引擎,1506 行,32Mb 數據。
文章的表結構:
+-----------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------------+-------------+------+-----+---------+-------+ | tw_id | bigint(20) | NO | MUL | NULL | | | p_id | varchar(50) | NO | PRI | NULL | | | created_time | int(11) | NO | | NULL | | | message | text | NO | | NULL | | | type | varchar(10) | YES | | NULL | | | expanded_url | text | YES | | NULL | | | eng_rate | int(11) | NO | | NULL | | | in_reply_to | bigint(20) | YES | MUL | NULL | | | user_id | varchar(20) | YES | | NULL | | | mention_or_post | varchar(7) | NO | MUL | NULL | | +-----------------+-------------+------+-----+---------+-------+
我的索引為查詢建構錯誤嗎?
- 更新 19/2
根據社區的建議對“scraper_index”進行了更改。將 eng_rate 替換為 created_time。查詢時間縮短約 2-3 秒。索引現在看起來像這樣:
+----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+ | posts | 0 | PRIMARY | 1 | p_id | A | 4276539 | NULL | NULL | | BTREE | | | | posts | 1 | c_id | 1 | tw_id | A | 1478 | NULL | NULL | | BTREE | | | | posts | 1 | mention_or_post | 1 | mention_or_post | A | 3 | NULL | NULL | | BTREE | | | | posts | 1 | scraper_index | 1 | in_reply_to | A | 610934 | NULL | NULL | YES | BTREE | | | | posts | 1 | scraper_index | 2 | mention_or_post | A | 610934 | NULL | NULL | | BTREE | | | | posts | 1 | scraper_index | 3 | created_time | A | 4276539 | NULL | NULL | | BTREE | | +----------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+--------------------------------------------------------------------------------+
慢查詢日誌顯示:
# Query_time: 8.945486 Lock_time: 0.000261 Rows_sent: 54 Rows_examined: 202192 SET timestamp=1455891107; SELECT SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message, posts.eng_rate, posts.created_time, posts.mention_or_post FROM posts JOIN stats ON stats.TW_ID = posts.TW_ID WHERE stats.category = 'hotel' AND posts.mention_or_post = 'tweet' AND posts.eng_rate > 1 AND posts.in_reply_to = 0 AND posts.created_time > 1453161930 ORDER BY eng_rate DESC;
用新索引解釋:
+----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-----------------------------+ | 1 | SIMPLE | posts | range | c_id,mention_or_post,scraper_index | scraper_index | 36 | NULL | 40152 | Using where; Using filesort | | 1 | SIMPLE | stats | ref | tw_id | tw_id | 8 | bacon.posts.tw_id | 1 | Using where | +----+-------------+----------+-------+------------------------------------+---------------+---------+----------------------------+-------+-----------------------------+
- 更新結束
- 22/2 更新
今晚用MySQL慢查詢日誌上面的具體問題查詢執行,發現問題依舊,查詢時間高,9-12秒。但是,預計早期的更改不會有重大改進。
今天,當我分析查詢時,我得到了意想不到的快速查詢結果(不到 0.5 秒),這讓我懷疑它必須對記憶體做一些事情,即使我一直手動使用 SQL_NO_CACHE 完成查詢。也許是某種磁碟記憶體,我不知道,但我無法像以前那樣重現較慢的查詢時間,所以我必須看看這些更改在下次執行的實際情況下如何工作。
根據 Rick James 的建議修改和創建新索引。所以修改後的索引是這樣的:
posts: INDEX(mention_or_post, in_reply_to, created_time) -- Named scraper_index stats: INDEX(TW_ID, category) -- Named tw_id
和新的索引:
stats: INDEX(category, TW_ID) -- Named test2 posts: INDEX(TW_ID, mention_or_post, in_reply_to, created_time) -- Named test1
查詢優化器選擇新索引“test2”和“test1”作為首選索引。解釋告訴:
+----+-------------+----------+------+------------------------------------------+-------+---------+----------------------------------------+------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+------------------------------------------+-------+---------+----------------------------------------+------+-----------------------------------------------------------+ | 1 | SIMPLE | stats | ref | tw_id,test2 | test2 | 92 | const | 44 | Using where; Using index; Using temporary; Using filesort | | 1 | SIMPLE | posts | ref | c_id,scraper_index,mention_or_post,test1 | test1 | 40 | bacon.stats.TW_ID,const,const | 7 | Using where | +----+-------------+----------+------+------------------------------------------+-------+---------+----------------------------------------+------+-----------------------------------------------------------+
這看起來很有希望。
第一次執行查詢:
+----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000080 | | checking permissions | 0.000004 | | checking permissions | 0.000004 | | Opening tables | 0.000020 | | System lock | 0.000010 | | init | 0.000039 | | optimizing | 0.000018 | | statistics | 0.000421 | | preparing | 0.000119 | | Creating tmp table | 0.006901 | | executing | 0.000006 | | Copying to tmp table | 0.008742 | | Sorting result | 0.000040 | | Sending data | 0.000138 | | end | 0.000005 | | removing tmp table | 0.000093 | | end | 0.000004 | | query end | 0.000003 | | closing tables | 0.000009 | | freeing items | 0.000323 | | logging slow query | 0.000003 | | logging slow query | 0.000025 | | cleaning up | 0.000003 | +----------------------+----------+ # Query_time: 0.016658 Lock_time: 0.000108 Rows_sent: 62 Rows_examined: 2933 SET timestamp=1456160782; SELECT SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message, posts.eng_rate, posts.created_time, posts.mention_or_post FROM posts JOIN stats ON stats.TW_ID = posts.TW_ID WHERE stats.category = 'hotel' AND posts.mention_or_post = 'tweet' AND posts.eng_rate > 1 AND posts.in_reply_to = 0 AND posts.created_time > 1453161930 ORDER BY eng_rate DESC;
再次執行相同的查詢
+----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000075 | | checking permissions | 0.000005 | | checking permissions | 0.000004 | | Opening tables | 0.000021 | | System lock | 0.000009 | | init | 0.000038 | | optimizing | 0.000018 | | statistics | 0.000237 | | preparing | 0.000018 | | Creating tmp table | 0.000158 | | executing | 0.000003 | | Copying to tmp table | 0.008009 | | Sorting result | 0.000040 | | Sending data | 0.000108 | | end | 0.000002 | | removing tmp table | 0.000131 | | end | 0.000004 | | query end | 0.000005 | | closing tables | 0.000009 | | freeing items | 0.000014 | | logging slow query | 0.000002 | | logging slow query | 0.000276 | | cleaning up | 0.000003 | +----------------------+----------+ # Query_time: 0.008897 Lock_time: 0.000102 Rows_sent: 62 Rows_examined: 2933 SET timestamp=1456160909; SELECT SQL_NO_CACHE posts.id, posts.p_id, posts.type, posts.message, posts.eng_rate, posts.created_time, posts.mention_or_post FROM posts JOIN stats ON stats.TW_ID = posts.TW_ID WHERE stats.category = 'hotel' AND posts.mention_or_post = 'tweet' AND posts.eng_rate > 1 AND posts.in_reply_to = 0 AND posts.created_time > 1453161930 ORDER BY eng_rate DESC;
因此,如果沒有記憶體在胡說八道,這看起來很有希望。我明天必須檢查“真正的”影響是什麼。
- 更新結束
也許我的 MYSQL 伺服器記憶體配置不足以完成這個查詢。簡而言之,我的配置:
key_buffer_size = 64M table_open_cache = 512 sort_buffer_size = 5M read_buffer_size = 5M read_rnd_buffer_size = 512K net_buffer_length = 8K myisam_sort_buffer_size = 8M join_buffer_size = 1M thread_cache_size = 4 max_heap_table_size = 50M tmp_table_size = 50M query_cache_limit = 15M query_cache_size = 60M
記憶體限制是此查詢的瓶頸嗎? 從 4,3M 潛在行中檢查了 340K 行的結果 - 需要多少記憶體才能使索引盡可能高效?
有沒有人有一些好的建議下一步該做什麼?
為更少的行對錶“posts”進行分區可能是改善查詢時間的解決方案,我可以對錶進行分區,但在該功能中,“新鮮”數據表可能包含數百萬行,所以它只是一個臨時解決方案這是不可取的。
如果最好從以下開始
stats
:stats: INDEX(category, TW_ID) -- "covering" posts: INDEX(TW_ID, mention_or_post, in_reply_to, created_time)
萬一
create_time
是比 更好的過濾器eng_rate
:posts: INDEX(mention_or_post, in_reply_to, created_time) stats: INDEX(TW_ID, category) -- "covering"
對於
posts
,從用 ‘=’ 比較的列開始(以任何順序),然後移動到一個“範圍”——eng_rate
或者created_time
“標誌” (
mention_or_post
) 本身幾乎不值得索引。如果它是一個標誌,請使用TINYINT
orENUM
,而不是笨重的VARCHAR(7)
。“對我來說,這告訴我索引‘scraper_index’效率太低,幾乎和沒有索引一樣慢。” - 不必要。可能是記憶體了一堆索引和/或數據塊,從而使結論無效。將每個查詢執行兩次並進行第二次計時。(並且要麼關閉查詢記憶體,要麼包括
SQL_NO_CACHE
。)(請提供
SHOW CREATE TABLE
,它比 更具描述性DESCRIBE
。)更多關於創建索引。
到底是
stats
什麼?“1506 行,32Mb 數據”意味著很多東西。你什麼時候會遷移到 InnoDB。