Mysql
為什麼在索引欄位中使用 where 很慢?
我們有一張大桌子。表結構是:
CREATE TABLE `visit_logs` ( `id` int(10) NOT NULL AUTO_INCREMENT, `user_id` int(10) NOT NULL, `ref_id` int(10) NOT NULL DEFAULT '0', `link_id` int(12) NOT NULL, `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `day_of_week` tinyint(1) NOT NULL, `jalali_month` tinyint(2) NOT NULL , `value` int(9) NOT NULL , `ref_share` float(4,2) NOT NULL DEFAULT '0.00', `balance` int(9) NOT NULL , `ads_id` int(12) NOT NULL, `ip` varchar(15) NOT NULL, `hash` varchar(32) NOT NULL, `referer` text, `shw` text, PRIMARY KEY (`id`), KEY `user_id` (`user_id`), KEY `jalali_month` (`jalali_month`), KEY `value` (`value`), KEY `ads_id` (`ads_id`), KEY `ip` (`ip`), KEY `hash` (`hash`), KEY `link_id` (`link_id`), KEY `day_of_week` (`day_of_week`), KEY `ref_share` (`ref_share`), KEY `ref_id` (`ref_id`), KEY `date` (`date`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
這些是一些範例查詢:
mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where id>13197856; +----------+-----------+ | count(*) | revenue | +----------+-----------+ | 10443393 | 200940970 | +----------+-----------+ 1 row in set (6.02 sec) mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where `date` > NOW() - INTERVAL 90 day; +----------+-----------+ | count(*) | revenue | +----------+-----------+ | 10443354 | 200940430 | +----------+-----------+ 1 row in set (9.24 sec) mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` ; +----------+-----------+ | count(*) | revenue | +----------+-----------+ | 23641291 | 596178719 | +----------+-----------+ 1 row in set (3.32 sec)
如您所見,選擇所有行比選擇按 id 過濾的某些行要快,按 id 過濾也比按時間戳過濾快。id 和 timestamp 欄位被索引!
Mysql 版本是 5.1 。我的.conf:
thread_concurrency = 8 query_cache_size = 1G thread_cache_size = 8 myisam_sort_buffer_size = 1G read_rnd_buffer_size = 256M read_buffer_size = 512M sort_buffer_size = 512M table_open_cache = 512 max_allowed_packet = 20M key_buffer_size = 1G #log = /var/log/mysqlq.log wait_timeout=120 connect_timeout=50 #max_execution_time=60000 tmp_table_size=2G max_heap_table_size=2G
解釋 :
explain SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where id>13197856; +----+-------------+------------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | visit_logs | ALL | PRIMARY | NULL | NULL | NULL | 12978097 | Using where | +----+-------------+------------+------+---------------+------+---------+------+----------+-------------+ 1 row in set (0.00 sec) mysql> explain SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` force index (PRIMARY) where id>13197856; +----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+ | 1 | SIMPLE | visit_logs | range | PRIMARY | PRIMARY | 4 | NULL | 10604250 | Using where | +----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+ 1 row in set (0.00 sec) mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` force index (PRIMARY) where id>13197856; +----------+-----------+ | count(*) | revenue | +----------+-----------+ | 10534567 | 202721650 | +----------+-----------+ 1 row in set (10.73 sec) mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` force index (PRIMARY) where id>13197856; +----------+-----------+ | count(*) | revenue | +----------+-----------+ | 10534578 | 202721830 | +----------+-----------+ 1 row in set (10.75 sec) mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where id>13197856; +----------+-----------+ | count(*) | revenue | +----------+-----------+ | 10534588 | 202722030 | +----------+-----------+ 1 row in set (3.50 sec) mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where id>13197856; +----------+-----------+ | count(*) | revenue | +----------+-----------+ | 10534592 | 202722070 | +----------+-----------+ 1 row in set (3.53 sec) mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` ; +----------+-----------+ | count(*) | revenue | +----------+-----------+ | 12978727 | 262590120 | +----------+-----------+ 1 row in set (1.66 sec) mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` ; +----------+-----------+ | count(*) | revenue | +----------+-----------+ | 12978730 | 262590180 | +----------+-----------+ 1 row in set (1.65 sec)
使用索引需要在索引和數據之間來回切換。
在 MyISAM 中,每個索引都是位於 .MYI 文件中的 BTree。索引的葉節點是指向 .MYD 文件的指針。(或者,對於 FIXED,它將是一個記錄號。)您
SELECTs
很樂意線性掃描索引(BTree 在這方面是有效的),但是對於每一行,它必須使用指針“搜尋”到.MYD 文件以獲取不在索引中的任何欄位。因為你要取大約一半的桌子——很多工作。
在不使用索引的情況下,MyISAM 將掃描 .MYD 文件(以任何順序)。這比來回使用索引更有效。
通常(沒有
FORCE INDEX
)如果通過索引查找估計的行數超過 20% 左右,優化器將決定進行表掃描。也就是說,通常優化器通過忽略索引來做“正確的事情”。如果這不能解釋所有差異,那麼我懷疑存在記憶體差異 - 記憶體在 OS 中的數據塊 (.MYD) 和/或記憶體在
key_buffer
.請注意 MyISAM 和 InnoDB 之間的區別:對於 MyISAM,它
PRIMARY KEY
的儲存方式與任何其他索引一樣。對於 InnoDB,它與數據聚集在一起;對數據進行範圍掃描也是如此WHERE id > ...
,因此效率更高,而且您不會看到 10 秒的時間。(還有其他顯著差異。)