將大表拆分為 12 個滾動月表並將它們用於報告或保留大表並刪除超過 1 年的行是否更快？

September 24, 2019

我的同事想將一個 158M 行的大型統計表拆分為 stats_jan、stats_feb……並使用 UNION 從它們中選擇報告。這是標準做法，是否比僅使用大表並刪除一年以上的行更快？該表是許多小行。

mysql&gt; describe stats;
+----------------+---------------------+------+-----+---------+----------------+
| Field          | Type                | Null | Key | Default | Extra          |
+----------------+---------------------+------+-----+---------+----------------+
| id             | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| badge_id       | bigint(20) unsigned | NO   | MUL | NULL    |                |
| hit_date       | datetime            | YES  | MUL | NULL    |                |
| hit_type       | tinyint(4)          | YES  |     | NULL    |                |
| source_id      | bigint(20) unsigned | YES  | MUL | NULL    |                |
| fingerprint_id | bigint(20) unsigned | YES  |     | NULL    |                |
+----------------+---------------------+------+-----+---------+----------------+

我確實手動拆分了表並將行複製到適當的月份表中，並創建了一個巨大的 UNION 查詢。大型 UNION 查詢耗時 14 秒，而單表查詢耗時 4.5m。當總行數相同時，為什麼許多較小的表比一張大表花費的時間要短得多？

create table stats_jan (...);
create table stats_feb (...);
...
create index stats_jan_hit_date_idx on stats_jan (hit_date);
...
insert into stats_jan select * from stats where hit_date &gt;= '2019-01-01' and hit_date &lt; '2019-02-01';
...
delete from stats where hit_date &lt; '2018-09-01';
...

每月表有 170 萬行到 3500 萬行。

select host as `key`, count(*) as value from stats join sources on source_id = sources.id where hit_date &gt;= '2019-08-21 19:43:19' and sources.host != 'NONE' group by source_id order by value desc limit 10;
4 min 30.39 sec

flush tables;
reset query cache;

select host as `key`, count(*) as value from stats_jan join sources on source_id = sources.id where hit_date &gt;= '2019-08-21 19:43:19' and sources.host != 'NONE' group by source_id
UNION
...
order by value desc limit 10;
14.16 sec

不要拆分錶。請改用 Range Partitionig。學習MySQL 8.0 參考手冊/分區。使用MySQL 8.0 參考手冊/…/ALTER TABLE 分區操作。請記住，最好提前為將來的時段創建分區（並且不要忘記創建LESS THAN MAXVALUE分區）。同時創建新分區並將現有數據移動到其中可能會更昂貴。
不要永久刪除數據。將其移動到單獨的存檔表中。如果您沒有足夠的磁碟空間 - 對此類存檔表進行備份，檢查其有效性，只有在成功後才刪除該表。如有必要（肯定會！），您可以恢復和使用這些數據。

引用自：https://dba.stackexchange.com/questions/249456

將大表拆分為 12 個滾動月表並將它們用於報告或保留大表並刪除超過 1 年的行是否更快？

相關問答

如何對動態表列表執行 UNION 查詢？

如何更新和管理數百萬個數據庫

是否有一個布爾列來儲存多值屬性的值是否是一個糟糕的微優化？

約束 2 個外鍵具有相同的輔助列值

如何針對任意使用者進行的查詢優化表？

為大多數行不存在的數據儲存列