如何提高對超過 2000 萬行表的查詢速度？

September 16, 2012

我有一個查詢，用於獲取某些 IP 地址的網際網路流量統計資訊。

有單獨的 IP 地址欄位hosts和稱為的 IP 塊assignments。數據以 5 分鐘的間隔儲存。

查詢結果在時間列上進行分組，並使用這 5 分鐘間隔內和外的總 SUM 繪製圖表。

該表被呼叫traffic並包含（在月底）大約 2100 萬條記錄。

SHOW CREATE table traffic:
CREATE TABLE `traffic` (
 `type` enum('v4_assignment','v4_host','v6_subnet','v6_assignment','v6_host') NOT NULL,
 `type_id` int(11) unsigned NOT NULL,
 `time` int(32) unsigned NOT NULL,
 `bytesin` bigint(20) unsigned NOT NULL default '0',
 `bytesout` bigint(20) unsigned NOT NULL default '0',
 KEY `basic_select` (`type_id`,`time`,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

SELECT traffic.time, SUM(traffic.bytesin), SUM(traffic.bytesout) FROM traffic 
WHERE (
   ( traffic.type = 'v4_assignment' AND type_id IN (231, between 20 to 100 ids,265)) OR 
   ( traffic.type = 'v4_host' AND type_id IN (131, ... a lot of ids... ,1506))) 
   AND traffic.time &gt;= 1343772000 AND traffic.time &lt; 1346450399 
GROUP BY traffic.time
ORDER BY traffic.time;

以下是explain上述查詢的輸出：

+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
| id | select_type | table   | type  | possible_keys | key          | key_len | ref  | rows   | Extra                                        |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | traffic | range | basic_select  | basic_select | 8       | NULL | 891319 | Using where; Using temporary; Using filesort |
+----+-------------+---------+-------+---------------+--------------+---------+------+--------+----------------------------------------------+

show indexes from traffic;
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table   | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| traffic |          1 | basic_select |            1 | type_id     | A         |       13835 |     NULL | NULL   |      | BTREE      |         |
| traffic |          1 | basic_select |            2 | time        | A         |    18470357 |     NULL | NULL   |      | BTREE      |         |
| traffic |          1 | basic_select |            3 | type        | A         |    18470357 |     NULL | NULL   |      | BTREE      |         |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

此查詢需要 30 秒到 30 分鐘才能完成。我希望我可以使用更好的索引來改進事情，或者使用不同的查詢，但我無法弄清楚。

更新：

根據樂於助人的評論者的建議，我創建了一個主鍵並添加了索引traffic_pk (time, type, type_id, id)。不幸的是，事實證明這個新索引的基數等於/低於我的原始索引（basic_select），並且 MySQL 仍然使用我的原始鍵。

更新 2： 我刪除了原始索引basic_select，現在EXPLAIN顯示更高的rows值，但EXTRA欄位中的步驟更少。查詢執行時間也降至一分鐘以下！（仍然有點太慢，但有重大改進！）。

mysql&gt; SHOW CREATE TABLE traffic_test \G;
*************************** 1. row ***************************
      Table: traffic_test
Create Table: CREATE TABLE `traffic_test` (
 `traffic_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `type` enum('v4_assignment','v4_host','v6_subnet','v6_assignment','v6_host') NOT NULL,
 `type_id` int(11) unsigned NOT NULL,
 `time` int(32) unsigned NOT NULL,
 `bytesin` bigint(20) unsigned NOT NULL DEFAULT '0',
 `bytesout` bigint(20) unsigned NOT NULL DEFAULT '0',
 PRIMARY KEY (`time`,`type`,`type_id`,`traffic_id`),
 KEY `traffic_id_IDX` (`traffic_id`)
) ENGINE=InnoDB AUTO_INCREMENT=24545159 DEFAULT CHARSET=latin1

表上的索引：

mysql&gt; SHOW INDEX FROM traffic;
+--------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table        | Non_unique | Key_name       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| traffic_test |          0 | PRIMARY        |            1 | time        | A         |          18 |     NULL | NULL   |      | BTREE      |         |
| traffic_test |          0 | PRIMARY        |            2 | type        | A         |       38412 |     NULL | NULL   |      | BTREE      |         |
| traffic_test |          0 | PRIMARY        |            3 | type_id     | A         |    24545609 |     NULL | NULL   |      | BTREE      |         |
| traffic_test |          0 | PRIMARY        |            4 | traffic_id  | A         |    24545609 |     NULL | NULL   |      | BTREE      |         |
| traffic_test |          1 | traffic_id_IDX |            1 | traffic_id  | A         |    24545609 |     NULL | NULL   |      | BTREE      |         |
+--------------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

我也通過不使用來簡化查詢OR：

SELECT SQL_NO_CACHE traffic.time, SUM(traffic.bytesin), SUM(traffic.bytesout) 
FROM    traffic
WHERE traffic.type LIKE 'v4_host' AND type_id IN (131,1974,1976,1514,1516,2767,2730,2731,2732,2733,2734,2769,2994,2709,1,4613,4614,4615,4616,326,1520,2652,1518,1521,1522,1523,1524,1525,2203,1515,1513,1467,1508,1973,1510,1975,1511,1475,1476,1468,1469,1470,1471,1472,1473,1500,1507,1478,1480,1481,1482,1483,1484,1485,1479,1486,1487,1488,1489,1490,1491,1495,1499,1494,2269,1474,1519,2204,2976,1922,1493,1492,1497,1496,1498,1501,1502,1503,1526,1509,1506) 
AND traffic.time &gt;= 1342181721 
AND traffic.time &lt; 1343391321 
GROUP BY traffic.time ASC;

此查詢的舊執行：

3980 rows in set (6 min 15.27 sec)

新的執行時間：

3980 rows in set (24.80 sec)

解釋輸出：

+----+-------------+---------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows     | Extra       |
+----+-------------+---------+-------+---------------+---------+---------+------+----------+-------------+
|  1 | SIMPLE      | traffic | range | PRIMARY       | PRIMARY | 4       | NULL | 12272804 | Using where |
+----+-------------+---------+-------+---------------+---------+---------+------+----------+-------------+

行值仍然很高。type我想我可以通過切換索引中的順序來改進這一點，type_id因為只有 4 種可能的類型和更多的 type_id。

這是一個正確的假設嗎？

1.表分區
由於
$$ AND traffic.time >= 1343772000 AND traffic.time < 1346450399 $$子句，我想您永遠不會從該表中刪除數據，或者該表目前正在儲存數月的數據。列中的值$$ time $$似乎是 unix 時間戳（1346450399 = Fri, 31 Aug 2012 21:59:59 GMT）根據時間列對錶進行分區。這將加快數據檢索，因為數據庫將掃描對應的分區（比掃描整個表要快得多）。
可以在這裡找到一個很棒的分區教程：http ://www.arachna.com/roller/spidaman/entry/scaling_rails_with_mysql_table
您需要為此計算時間戳範圍，但這應該不難做到。
例如：(1346450399 - 1343772000) / 60 / 60 / 24 =~ 31 天。因此，保存 9 月數據（也有 31 天）的分區的最大值將為：1346450399 + ( 31 * 24 * 60 * 60)
可以在這裡找到一個 unix 迄今為止的計算器：http ://www.onlineconversion.com/unix_time.htm
2.重寫查詢
因為 WHERE 塊中的“OR”，優化器將選擇不使用定義的索引。嘗試將查詢拆分為 2 個選擇，並進行聯合。
SELECT 
   traffic.time, 
   SUM(traffic.bytesin), 
   SUM(traffic.bytesout) 
FROM 
   traffic 
WHERE traffic.type LIKE 'v4_assignment' 
   AND type_id IN (1,2,3,4)
   AND traffic.time &gt;= 1343772000 AND traffic.time &lt;= 1346450399 
GROUP BY 
   traffic.time
UNION
SELECT 
   traffic.time, 
   SUM(traffic.bytesin), 
   SUM(traffic.bytesout) 
FROM 
   traffic 
WHERE traffic.type LIKE 'v4_host' 
   AND type_id IN (5,6,7,8)
   AND traffic.time &gt;= 1343772000 AND traffic.time &lt;= 1346450399 
GROUP BY 
   traffic.time
ORDER BY 
   traffic.time
基於數據基數的新索引
根據您的解釋輸出，我沒有看到使用的索引。可能是因為優化器決定進行全表掃描然後跟踪索引會更容易（更便宜）。此外，在您目前的索引中，第一列的基數低於接下來的 2 列。任何索引中的第一列都應該是具有最佳（最大）基數的列。
創建一個新索引為：
MYSQL&gt; CREATE INDEX MTIhai_traffic_idx1 ON traffic(time, type, type_id)

引用自：https://dba.stackexchange.com/questions/22979

如何提高對超過 2000 萬行表的查詢速度？

1.表分區

2.重寫查詢

相關問答

如何使 MySQL 在整數範圍選擇查詢中使用索引

為什麼在 MySQL 中批量多列鍵查詢這麼慢？

複雜的關鍵性能問題（MySQL、InnoDB）

索引但在查詢期間仍然很慢

查詢等式的最佳索引策略一種一種A，範圍乙乙B，通過…分組CCC, AND 排序方式count(P)C○你n噸(磷)count(P)

向我的外鍵添加索引會提高這個 MySQL 查詢的性能嗎？

如何提高對超過 2000 萬行表的查詢速度？

1.表分區

2.重寫查詢

相關問答

如何使 MySQL 在整數範圍選擇查詢中使用索引

為什麼在 MySQL 中批量多列鍵查詢這麼慢？

複雜的關鍵性能問題（MySQL、InnoDB）

索引但在查詢期間仍然很慢

查詢等式的最佳索引策略一種一種A， 範圍乙乙B， 通過…分組CCC, AND 排序方式count(P)C○你n噸(磷)count(P)

向我的外鍵添加索引會提高這個 MySQL 查詢的性能嗎？

查詢等式的最佳索引策略一種一種A，範圍乙乙B，通過…分組CCC, AND 排序方式count(P)C○你n噸(磷)count(P)