Mysql全文搜尋my.cnf優化
我在https://serverfault.com/questions/353888/mysql-full-text-search-cause-high-usage-cpu上提出了一個問題,一些使用者建議在這裡提問。
我們建立了一個新聞網站。每天我們都會從 web api 輸入數以萬計的數據。
為了提供精準的搜尋服務,我們的表格使用了MyISAM,建立了全文索引(標題、內容、日期)。我們的網站正在使用 2GB RAM、30GB 空間(無交換,因為 VDS 不允許建構交換)的 Godaddy VDS 上進行測試。CPU是
Intel(R) Xeon(R) CPU L5609 @ 1.87GHz
執行後
./mysqltuner.pl
我們得到一些結果:
-------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.5.20 [OK] Operating on 32-bit architecture with less than 2GB RAM -------- Storage Engine Statistics ------------------------------------------- [--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 396M (Tables: 39) [--] Data in InnoDB tables: 208K (Tables: 8) [!!] Total fragmented tables: 9 -------- Security Recommendations ------------------------------------------- [!!] User '@ip-XX-XX-XX-XX.ip.secureserver.net' [!!] User '@localhost' -------- Performance Metrics ------------------------------------------------- [--] Up for: 17h 27m 58s (1M q [20.253 qps], 31K conn, TX: 513M, RX: 303M) [--] Reads / Writes: 61% / 39% [--] Total buffers: 168.0M global + 2.7M per thread (151 max threads) [OK] Maximum possible memory usage: 573.8M (28% of installed RAM) [OK] Slow queries: 0% (56/1M) [!!] Highest connection usage: 100% (152/151) [OK] Key buffer size / total MyISAM indexes: 8.0M/162.5M [OK] Key buffer hit rate: 100.0% (2B cached / 882K reads) [!!] Query cache is disabled [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 17K sorts) [!!] Temporary tables created on disk: 49% (32K on disk / 64K total) [!!] Thread cache is disabled [!!] Table cache hit rate: 0% (400 open / 298K opened) [OK] Open file limit used: 41% (421/1K) [!!] Table locks acquired immediately: 77% [OK] InnoDB data size / buffer pool: 208.0K/128.0M -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance MySQL started within last 24 hours - recommendations may be inaccurate Enable the slow query log to troubleshoot bad queries Reduce or eliminate persistent connections to reduce connection usage When making adjustments, make tmp_table_size/max_heap_table_size equal Reduce your SELECT DISTINCT queries without LIMIT clauses Set thread_cache_size to 4 as a starting value Increase table_cache gradually to avoid file descriptor limits Optimize queries and/or use InnoDB to reduce lock wait Variables to adjust: max_connections (> 151) wait_timeout (< 28800) interactive_timeout (< 28800) query_cache_size (>= 8M) tmp_table_size (> 16M) max_heap_table_size (> 16M) thread_cache_size (start at 4) table_cache (> 400)
這是
my.cnf
[mysqld] port = 3306 socket = /tmp/mysql.sock skip-external-locking key_buffer_size = 256M max_allowed_packet = 16M max_connections = 1024 wait_timeout = 5 table_open_cache = 512 sort_buffer_size = 2M read_buffer_size = 2M read_rnd_buffer_size = 2M myisam_sort_buffer_size = 128M thread_cache_size = 8 query_cache_size= 256M # Try number of CPU's*2 for thread_concurrency thread_concurrency = 8 ft_min_word_len = 2 read_rnd_buffer_size=2M tmp_table_size=128M
我不確定如何優化
my.cnf
取決於./mysqltuner.pl
返回結果。
我有一個有趣的驚喜給你。
您可以做的唯一優化全文索引不是在 my.cnf 級別。這與兩件事有關:
- 停用詞列表
- 查詢
停用詞
有543 個停用詞您可能希望或可能不希望從 FULLTEXT 索引中過濾掉。停用詞列表是在編譯時建構的。您可以使用自己的列表覆蓋該列表,如下所示:
好的,現在讓我們創建我們的停用詞列表。我通常將英文文章設置為唯一的停用詞。
echo "a" > /var/lib/mysql/stopwords.txt echo "an" >> /var/lib/mysql/stopwords.txt echo "the" >> /var/lib/mysql/stopwords.txt
接下來,將選項添加到 /etc/my.cnf 加上允許 1 個字母、2 個字母和 3 個字母的單詞
[mysqld] ft_min_word_len=1 ft_stopword_file=/var/lib/mysql/stopwords.txt
最後重啟mysql
service mysql restart
如果您有任何具有 FULLTEXT 索引的表,則必須刪除這些 FULLTEXT 索引並重新創建它們。
詢問
這是關於使用全表索引的 MySQL 查詢的一個鮮為人知的事實:有時 MySQL 查詢優化器完全停止使用 FULLTEXT 索引並執行全表掃描。
這是一個例子:
use test drop table if exists ft_test; create table ft_test ( id int not null auto_increment, txt text, primary key (id), FULLTEXT (txt) ) ENGINE=MyISAM; insert into ft_test (txt) values ('mount camaroon'),('mount camaron'),('mount camnaroon'), ('mount cameroon'),('mount cemeroon'),('mount camnaroon'), ('mount camraon'),('mount camaraon'),('mount camaran'), ('mount camnaraon'),('mount cameroan'),('mount cemeroan'), ('mount camnaraon'),('munt camraon'),('munt camaraon'), ('munt camaran'),('munt camnaraon'),('munt cameroan'), ('munt cemeroan'),('munt camnaraon'),('mount camraan'); select * from ft_test WHERE MATCH(txt) AGAINST ("+mount +cameroon" IN BOOLEAN MODE);
這是載入的範例數據:
mysql> use test Database changed mysql> drop table if exists ft_test; Query OK, 0 rows affected (0.00 sec) mysql> create table ft_test -> ( -> id int not null auto_increment, -> txt text, -> primary key (id), -> FULLTEXT (txt) -> ) ENGINE=MyISAM; Query OK, 0 rows affected (0.03 sec) mysql> insert into ft_test (txt) values -> ('mount camaroon'),('mount camaron'),('mount camnaroon'), -> ('mount cameroon'),('mount cemeroon'),('mount camnaroon'), -> ('mount camraon'),('mount camaraon'),('mount camaran'), -> ('mount camnaraon'),('mount cameroan'),('mount cemeroan'), -> ('mount camnaraon'),('munt camraon'),('munt camaraon'), -> ('munt camaran'),('munt camnaraon'),('munt cameroan'), -> ('munt cemeroan'),('munt camnaraon'),('mount camraan'); Query OK, 21 rows affected (0.00 sec) Records: 21 Duplicates: 0 Warnings: 0 mysql>
這是一個範例查詢及其解釋計劃
mysql> select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE); +----+----------------+ | id | txt | +----+----------------+ | 4 | mount cameroon | +----+----------------+ 1 row in set (0.00 sec) mysql> explain select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ft_test type: fulltext possible_keys: txt key: txt key_len: 0 ref: rows: 1 Extra: Using where 1 row in set (0.00 sec) mysql>
好的,使用了 FULLTEXT 索引。
現在,讓我們稍微改變一下查詢
mysql> select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1; +----+----------------+ | id | txt | +----+----------------+ | 4 | mount cameroon | +----+----------------+ 1 row in set (0.00 sec) mysql> explain select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ft_test type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 21 Extra: Using where 1 row in set (0.00 sec) mysql>
OMG FULLTEXT 索引發生了什麼?MySQL 查詢優化器基本上對它嗤之以鼻。如果您正在使用 ft_test 表執行 JOIN,則一旦發出全文搜尋的 WHERE 子句並且它會執行相同的操作,那麼誰知道查詢的其餘部分會發生什麼。
解決方案是重構查詢並嘗試隔離 FULLTEXT 搜尋並僅收集鍵。然後將這些鍵 LEFT JOIN 加入到原始表中。
例子
SELECT B.* FROM (SELECT id from ft_test WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A LEFT JOIN ft_test B USING (id);
對於這個查詢,這裡是結果和它的解釋
mysql> SELECT B.* -> FROM (SELECT id from ft_test -> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A -> LEFT JOIN ft_test B USING (id); +----+----------------+ | id | txt | +----+----------------+ | 4 | mount cameroon | +----+----------------+ 1 row in set (0.00 sec) mysql> explain SELECT B.* -> FROM (SELECT id from ft_test -> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A -> LEFT JOIN ft_test B USING (id)\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: system possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 Extra: *************************** 2. row *************************** id: 1 select_type: PRIMARY table: B type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: *************************** 3. row *************************** id: 2 select_type: DERIVED table: ft_test type: fulltext possible_keys: txt key: txt key_len: 0 ref: rows: 1 Extra: Using where 3 rows in set (0.00 sec) mysql>
請注意,在 EXPLAIN 計劃的 DERIVED2 部分中,確實使用了 FULLTEXT 索引。
這個故事所講的道德
您必須養成決定數據庫將有多少停用詞、創建停用詞列表、配置它,然後創建/重新創建所有 FULLTEXT 索引的習慣。您還必須養成重構 FULLTEXT 搜尋查詢的習慣,以使 MySQL 查詢優化器不會生成錯誤的 EXPLAIN 計劃或使參與 EXPLAIN 計劃的其餘查詢的索引無效。