Mysql

如何提高對超過 2000 萬行表的查詢速度?

  • September 16, 2012

我有一個查詢,用於獲取某些 IP 地址的網際網路流量統計資訊。

有單獨的 IP 地址欄位hosts和稱為 的 IP 塊assignments。數據以 5 分鐘的間隔儲存。

查詢結果在時間列上進行分組,並使用這 5 分鐘間隔內和外的總 SUM 繪製圖表。

該表被呼叫traffic並包含(在月底)大約 2100 萬條記錄。

SHOW CREATE table traffic:
CREATE TABLE `traffic` (
 `type` enum('v4_assignment','v4_host','v6_subnet','v6_assignment','v6_host') NOT NULL,
 `type_id` int(11) unsigned NOT NULL,
 `time` int(32) unsigned NOT NULL,
 `bytesin` bigint(20) unsigned NOT NULL default '0',
 `bytesout` bigint(20) unsigned NOT NULL default '0',
 KEY `basic_select` (`type_id`,`time`,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT traffic.time, SUM(traffic.bytesin), SUM(traffic.bytesout) FROM traffic 
WHERE (
   ( traffic.type = 'v4_assignment' AND type_id IN (231, between 20 to 100 ids,265)) OR 
   ( traffic.type = 'v4_host' AND type_id IN (131, ... a lot of ids... ,1506))) 
   AND traffic.time >= 1343772000 AND traffic.time < 1346450399 
GROUP BY traffic.time
ORDER BY traffic.time;

以下是explain上述查詢的輸出:

+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
| id | select_type | table   | type  | possible_keys | key          | key_len | ref  | rows   | Extra                                        |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | traffic | range | basic_select  | basic_select | 8       | NULL | 891319 | Using where; Using temporary; Using filesort |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+

show indexes from traffic;
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table   | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| traffic |          1 | basic_select |            1 | type_id     | A         |       13835 |     NULL | NULL   |      | BTREE      |         |
| traffic |          1 | basic_select |            2 | time        | A         |    18470357 |     NULL | NULL   |      | BTREE      |         |
| traffic |          1 | basic_select |            3 | type        | A         |    18470357 |     NULL | NULL   |      | BTREE      |         |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

此查詢需要 30 秒到 30 分鐘才能完成。我希望我可以使用更好的索引來改進事情,或者使用不同的查詢,但我無法弄清楚。

更新:

根據樂於助人的評論者的建議,我創建了一個主鍵並添加了索引traffic_pk (time, type, type_id, id)。不幸的是,事實證明這個新索引的基數等於/低於我的原始索引(basic_select),並且 MySQL 仍然使用我的原始鍵。

更新 2: 我刪除了原始索引basic_select,現在EXPLAIN顯示更高的rows值,但EXTRA欄位中的步驟更少。查詢執行時間也降至一分鐘以下!(仍然有點太慢,但有重大改進!)。

mysql> SHOW CREATE TABLE traffic_test \G;
*************************** 1. row ***************************
      Table: traffic_test
Create Table: CREATE TABLE `traffic_test` (
 `traffic_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `type` enum('v4_assignment','v4_host','v6_subnet','v6_assignment','v6_host') NOT NULL,
 `type_id` int(11) unsigned NOT NULL,
 `time` int(32) unsigned NOT NULL,
 `bytesin` bigint(20) unsigned NOT NULL DEFAULT '0',
 `bytesout` bigint(20) unsigned NOT NULL DEFAULT '0',
 PRIMARY KEY (`time`,`type`,`type_id`,`traffic_id`),
 KEY `traffic_id_IDX` (`traffic_id`)
) ENGINE=InnoDB AUTO_INCREMENT=24545159 DEFAULT CHARSET=latin1

表上的索引:

mysql> SHOW INDEX FROM traffic;
+--------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table        | Non_unique | Key_name       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| traffic_test |          0 | PRIMARY        |            1 | time        | A         |          18 |     NULL | NULL   |      | BTREE      |         |
| traffic_test |          0 | PRIMARY        |            2 | type        | A         |       38412 |     NULL | NULL   |      | BTREE      |         |
| traffic_test |          0 | PRIMARY        |            3 | type_id     | A         |    24545609 |     NULL | NULL   |      | BTREE      |         |
| traffic_test |          0 | PRIMARY        |            4 | traffic_id  | A         |    24545609 |     NULL | NULL   |      | BTREE      |         |
| traffic_test |          1 | traffic_id_IDX |            1 | traffic_id  | A         |    24545609 |     NULL | NULL   |      | BTREE      |         |
+--------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

我也通過不使用來簡化查詢OR

SELECT SQL_NO_CACHE traffic.time, SUM(traffic.bytesin), SUM(traffic.bytesout) 
FROM    traffic
WHERE traffic.type LIKE 'v4_host' AND type_id IN (131,1974,1976,1514,1516,2767,2730,2731,2732,2733,2734,2769,2994,2709,1,4613,4614,4615,4616,326,1520,2652,1518,1521,1522,1523,1524,1525,2203,1515,1513,1467,1508,1973,1510,1975,1511,1475,1476,1468,1469,1470,1471,1472,1473,1500,1507,1478,1480,1481,1482,1483,1484,1485,1479,1486,1487,1488,1489,1490,1491,1495,1499,1494,2269,1474,1519,2204,2976,1922,1493,1492,1497,1496,1498,1501,1502,1503,1526,1509,1506) 
AND traffic.time >= 1342181721 
AND traffic.time < 1343391321 
GROUP BY traffic.time ASC;

此查詢的舊執行:

3980 rows in set (6 min 15.27 sec)

新的執行時間:

3980 rows in set (24.80 sec)

解釋輸出:

+----+-------------+---------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows     | Extra       |
+----+-------------+---------+-------+---------------+---------+---------+------+----------+-------------+
|  1 | SIMPLE      | traffic | range | PRIMARY       | PRIMARY | 4       | NULL | 12272804 | Using where |
+----+-------------+---------+-------+---------------+---------+---------+------+----------+-------------+

行值仍然很高。type我想我可以通過切換索引中的順序來改進這一點,type_id因為只有 4 種可能的類型和更多的 type_id。

這是一個正確的假設嗎?

1.表分區

由於

$$ AND traffic.time >= 1343772000 AND traffic.time < 1346450399 $$子句,我想您永遠不會從該表中刪除數據,或者該表目前正在儲存數月的數據。列中的值$$ time $$似乎是 unix 時間戳(1346450399 = Fri, 31 Aug 2012 21:59:59 GMT)根據時間列對錶進行分區。這將加快數據檢索,因為數據庫將掃描對應的分區(比掃描整個表要快得多)。

2.重寫查詢

因為 WHERE 塊中的“OR”,優化器將選擇不使用定義的索引。嘗試將查詢拆分為 2 個選擇,並進行聯合。

SELECT 
   traffic.time, 
   SUM(traffic.bytesin), 
   SUM(traffic.bytesout) 
FROM 
   traffic 
WHERE traffic.type LIKE 'v4_assignment' 
   AND type_id IN (1,2,3,4)
   AND traffic.time &gt;= 1343772000 AND traffic.time &lt;= 1346450399 
GROUP BY 
   traffic.time
UNION
SELECT 
   traffic.time, 
   SUM(traffic.bytesin), 
   SUM(traffic.bytesout) 
FROM 
   traffic 
WHERE traffic.type LIKE 'v4_host' 
   AND type_id IN (5,6,7,8)
   AND traffic.time &gt;= 1343772000 AND traffic.time &lt;= 1346450399 
GROUP BY 
   traffic.time
ORDER BY 
   traffic.time
  1. 基於數據基數的新索引

根據您的解釋輸出,我沒有看到使用的索引。可能是因為優化器決定進行全表掃描然後跟踪索引會更容易(更便宜)。此外,在您目前的索引中,第一列的基數低於接下來的 2 列。任何索引中的第一列都應該是具有最佳(最大)基數的列。

創建一個新索引為:

MYSQL&gt; CREATE INDEX MTIhai_traffic_idx1 ON traffic(time, type, type_id)

引用自:https://dba.stackexchange.com/questions/22979