Mysql

為什麼在索引欄位中使用 where 很慢?

  • April 25, 2016

我們有一張大桌子。表結構是:

CREATE TABLE `visit_logs` (
 `id` int(10) NOT NULL AUTO_INCREMENT,
 `user_id` int(10) NOT NULL,
 `ref_id` int(10) NOT NULL DEFAULT '0',
 `link_id` int(12) NOT NULL,
 `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `day_of_week` tinyint(1) NOT NULL,
 `jalali_month` tinyint(2) NOT NULL ,
 `value` int(9) NOT NULL ,
 `ref_share` float(4,2) NOT NULL DEFAULT '0.00',
 `balance` int(9) NOT NULL ,
 `ads_id` int(12) NOT NULL,
 `ip` varchar(15) NOT NULL,
 `hash` varchar(32) NOT NULL,
 `referer` text,
 `shw` text,
 PRIMARY KEY (`id`),
 KEY `user_id` (`user_id`),
 KEY `jalali_month` (`jalali_month`),
 KEY `value` (`value`),
 KEY `ads_id` (`ads_id`),
 KEY `ip` (`ip`),
 KEY `hash` (`hash`),
 KEY `link_id` (`link_id`),
 KEY `day_of_week` (`day_of_week`),
 KEY `ref_share` (`ref_share`),
 KEY `ref_id` (`ref_id`),
 KEY `date` (`date`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

這些是一些範例查詢:

mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10443393 | 200940970 |
+----------+-----------+
1 row in set (6.02 sec)

mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where `date`  > NOW() - INTERVAL 90 day;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10443354 | 200940430 |
+----------+-----------+
1 row in set (9.24 sec)

mysql> SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` ;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 23641291 | 596178719 |
+----------+-----------+
1 row in set (3.32 sec)

如您所見,選擇所有行比選擇按 id 過濾的某些行要快,按 id 過濾也比按時間戳過濾快。id 和 timestamp 欄位被索引!

Mysql 版本是 5.1 。我的.conf:

thread_concurrency = 8
query_cache_size = 1G
thread_cache_size = 8
myisam_sort_buffer_size = 1G
read_rnd_buffer_size = 256M
read_buffer_size = 512M
sort_buffer_size = 512M
table_open_cache = 512
max_allowed_packet = 20M
key_buffer_size = 1G
#log = /var/log/mysqlq.log
wait_timeout=120
connect_timeout=50
#max_execution_time=60000
tmp_table_size=2G
max_heap_table_size=2G

解釋 :

explain SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` where id>13197856;
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | visit_logs | ALL  | PRIMARY       | NULL | NULL    | NULL | 12978097 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` force index (PRIMARY) where id>13197856;
+----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows     | Extra       |
+----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+
|  1 | SIMPLE      | visit_logs | range | PRIMARY       | PRIMARY | 4       | NULL | 10604250 | Using where |
+----+-------------+------------+-------+---------------+---------+---------+------+----------+-------------+
1 row in set (0.00 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` force index (PRIMARY) where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10534567 | 202721650 |
+----------+-----------+
1 row in set (10.73 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` force index (PRIMARY) where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10534578 | 202721830 |
+----------+-----------+
1 row in set (10.75 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs`  where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10534588 | 202722030 |
+----------+-----------+
1 row in set (3.50 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs`  where id>13197856;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 10534592 | 202722070 |
+----------+-----------+
1 row in set (3.53 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` ;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 12978727 | 262590120 |
+----------+-----------+
1 row in set (1.66 sec)

mysql>  SELECT count(*),SUM( value ) AS `revenue` FROM `visit_logs` ;
+----------+-----------+
| count(*) | revenue   |
+----------+-----------+
| 12978730 | 262590180 |
+----------+-----------+
1 row in set (1.65 sec)

使用索引需要在索引和數據之間來回切換。

在 MyISAM 中,每個索引都是位於 .MYI 文件中的 BTree。索引的葉節點是指向 .MYD 文件的指針。(或者,對於 FIXED,它將是一個記錄號。)您SELECTs很樂意線性掃描索引(BTree 在這方面是有效的),但是對於每一行,它必須使用指針“搜尋”到.MYD 文件以獲取不在索引中的任何欄位。

因為你要取大約一半的桌子——很多工作。

在不使用索引的情況下,MyISAM 將掃描 .MYD 文件(以任何順序)。這比來回使用索引更有效。

通常(沒有FORCE INDEX)如果通過索引查找估計的行數超過 20% 左右,優化器將決定進行表掃描。也就是說,通常優化器通過忽略索引來做“正確的事情”。

如果這不能解釋所有差異,那麼我懷疑存在記憶體差異 - 記憶體在 OS 中的數據塊 (.MYD) 和/或記憶體在key_buffer.

請注意 MyISAM 和 InnoDB 之間的區別:對於 MyISAM,它PRIMARY KEY的儲存方式與任何其他索引一樣。對於 InnoDB,它與數據聚集在一起;對數據進行範圍掃描也是如此WHERE id > ...,因此效率更高,而且您不會看到 10 秒的時間。(還有其他顯著差異。)

引用自:https://dba.stackexchange.com/questions/136362