選擇一個連接的大型日誌表的性能問題
我有一個大約 1300 萬行的 Log-Tabelle。表結構(截斷)如下:
CREATE TABLE `dmsserver_status_log` ( `id` bigint(20) NOT NULL, `customer_id` int(11) NOT NULL, `timestamp` datetime(6) NOT NULL, `level` varchar(10) NOT NULL, `logger_name` varchar(255) NOT NULL, `message` text NOT NULL, ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- id是 PK 和 AUTO_INCREMENT
- customer_id和 timestamp 各有一個 KEY
- customer_id具有“客戶”表的外鍵
客戶表結構(截斷)如下,大約有 1700 行:
CREATE TABLE `customer` ( `id` int(11) NOT NULL, `number` varchar(10) NOT NULL, [...] ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- id是 PK 和 AUTO_INCREMENT
- 數字是唯一鍵
這些列上有更多列和更多鍵,但我無法想像這與我的性能問題有關(?)
如果我查詢我的日誌表(沒有連接),我會在短時間內收到結果:
SELECT `timestamp`, `level`, logger_name, message FROM dmsserver_status_log as log WHERE `timestamp` > '2020-03-01 00:00:00' AND message LIKE '%fehlgeschlagen%' ORDER BY `timestamp` ASC;
在這個例子中,我在一秒鐘內獲得了 57 次點擊。這對我來說沒問題。如果我在客戶表上使用 JOIN 擴展我的選擇,則相同的選擇需要大約 30 秒或更長時間才能完成:
SELECT `timestamp`, `level`, logger_name, message, customer.`number` FROM dmsserver_status_log as log JOIN customer ON customer.id = log.customer_id WHERE `timestamp` > '2020-03-01 00:00:00' AND message LIKE '%fehlgeschlagen%' ORDER BY `timestamp` ASC;
根據我的查詢和時間戳的範圍,這個查詢可能需要幾分鐘或幾十分鐘。我只是不明白為什麼這個選擇需要這麼長時間才能完成。
這是帶有連接的上述選擇的“解釋擴展”輸出:
id select_type `table` `type` possible_keys `key` key_len `ref` `rows` filtered Extra 1 SIMPLE customer index PRIMARY number 32 1749 100 Using index; Using temporary; Using filesort 1 SIMPLE log ref customer_id,timestamp customer_id 4 customer.id 515 100 Using where
有人可以向我解釋為什麼加入第二張桌子需要這麼多時間嗎?
編輯:
按照這裡的要求,結果是
SHOW CREATE TABLE dmsserver_status_log
CREATE TABLE `dmsserver_status_log` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `customer_id` int(11) NOT NULL, `timestamp` datetime(6) NOT NULL, `thread_name` varchar(255) NOT NULL, `level` varchar(10) NOT NULL, `logger_name` varchar(255) NOT NULL, `source_host` varchar(50) NOT NULL, `message` text NOT NULL, `stacktrace` text, `mdc` text, PRIMARY KEY (`id`), KEY `customer_id` (`customer_id`), KEY `timestamp` (`timestamp`), CONSTRAINT `dmsserver_status_log_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`) ON UPDATE CASCADE ) ENGINE=InnoDB AUTO_INCREMENT=17996215 DEFAULT CHARSET=utf8
這是結果
SHOW CREATE TABLE customer
CREATE TABLE `customer` ( `id` int(11) NOT NULL AUTO_INCREMENT, `number` varchar(10) NOT NULL, `headquarter_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des Hauptbetriebs wenn dies eine Filiale ist', `it_location_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des EDV-Standorts wenn dieser Kunde keine eigene EDV hat', `customer_type_id` int(11) NOT NULL DEFAULT '1', `address_id` int(11) NOT NULL, `characteristic_id` int(11) NOT NULL DEFAULT '1', `iban` varchar(34) DEFAULT NULL, `bic` varchar(11) DEFAULT NULL, `bank` varchar(100) DEFAULT NULL, `newsletter` tinyint(1) NOT NULL DEFAULT '1', `newsletter_email` varchar(100) DEFAULT NULL, `invoice_email` varchar(100) DEFAULT NULL COMMENT 'Email-Adresse für den Empfang von Rechnungen als PDF', `api_password` varchar(255) DEFAULT NULL COMMENT 'Zugangspasswort für den Zugriff auf unsere API', `comments` varchar(250) DEFAULT NULL, `valid_id` smallint(6) NOT NULL, `create_time` datetime NOT NULL, `create_by` int(11) NOT NULL, `change_time` datetime NOT NULL, `change_by` int(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `number` (`number`), KEY `change_by` (`change_by`), KEY `characteristic_id` (`characteristic_id`), KEY `customer_type_id` (`customer_type_id`), KEY `address_id` (`address_id`), KEY `valid_id` (`valid_id`), KEY `create_by` (`create_by`), KEY `headquarter_customer_number` (`headquarter_customer_number`), KEY `it_location_customer_number` (`it_location_customer_number`), CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`customer_type_id`) REFERENCES `customer_type` (`id`), CONSTRAINT `customer_ibfk_2` FOREIGN KEY (`address_id`) REFERENCES `address` (`id`), CONSTRAINT `customer_ibfk_3` FOREIGN KEY (`characteristic_id`) REFERENCES `customer_characteristic` (`id`), CONSTRAINT `customer_ibfk_4` FOREIGN KEY (`create_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE, CONSTRAINT `customer_ibfk_5` FOREIGN KEY (`change_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE, CONSTRAINT `customer_ibfk_6` FOREIGN KEY (`headquarter_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE, CONSTRAINT `customer_ibfk_7` FOREIGN KEY (`it_location_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE ) ENGINE=InnoDB AUTO_INCREMENT=2540 DEFAULT CHARSET=utf8
請提供
EXPLAIN SELECT ...
每個變體。什麼版本的 MariaDB?(這方面的優化已經改變。)
也嘗試這些變化
這將過濾隔離到派生表中,從而最大限度地減少進入另一個表的需要。
SELECT log.`timestamp`, log.`level`, log.logger_name, log.message, c.`number` FROM ( SELECT log1.customer_id, log1.`timestamp`, log1.`level`, log1.logger_name, log1.message FROM dmsserver_status_log AS log1 WHERE `timestamp` > '2020-03-01 00:00:00' AND message LIKE '%fehlgeschlagen%' ) AS log JOIN customer AS c ON c.id = log.customer_id ORDER BY log.`timestamp` ASC;
這個推遲以不同的方式查看第二張桌子。
SELECT `timestamp`, `level`, logger_name, message, ( SELECT `number` FROM customer WHERE customer.id = log.customer_id ) AS "number" FROM dmsserver_status_log as log JOIN customer ON customer.id = log.customer_id WHERE `timestamp` > '2020-03-01 00:00:00' AND message LIKE '%fehlgeschlagen%' ORDER BY `timestamp` ASC;
根據 MySQL/MariaDB 的版本,這三種變體可能會進行不同的優化。
如果您正在搜尋“單詞”,請考慮使用
FULLTEXT(message)
andMATCH(message) AGAINST ('+fehlgeschlagen' IN BOOLEAN MODE)
。它應該執行得更快。
您在表上
customer_id
和表中都有單獨的索引(又名鍵)。對於任何給定的查詢,MariaDB 通常只能使用一個索引。因此,當您更改查詢以包含連接時,MariaDB 必須選擇要使用的索引之一。timestamp``dmsserver_status_log
相反,您需要的是表上一個合適的複合索引,它可以同時支持子句、the和列中的
dmsserver_status_log
相關條件。WHERE``JOIN``ORDER BY
所以你需要的索引可能是
CREATE INDEX dmsserver_status_log_idx1 ON (customer_id, timestamp)
. 請注意,列的順序很重要。(如果EXPLAIN
顯示優化器不會使用此索引,那麼您可以嘗試使用相反順序的列重新創建它。查詢優化可能很困難……)請注意,
message
也出現在您的WHERE
子句中的列不會從具有特定查詢的複合索引中受益,因為您在搜尋字元串之前有一個萬用字元。如果您只關心特定 JOIN 查詢的良好性能,
customer_id
您可能需要考慮刪除現有的單個索引。timestamp
在一個表中維護大量的索引意味著更慢INSERT
的 s 並且它們也佔用了一些記憶體。作為參考,這裡有一篇關於為 MariaDB/MySQL 建構索引的文章: Building the best INDEX for a given SELECT