Sql-Server
結合全文和標量索引
假設我們有一個包含 1200 萬個姓名和地址的數據庫,需要使用全文進行搜尋,但每一行也包含一個整數值,比如說
COMPANYID
. 該表在這 1200 萬行中包含大約 250 個不同的 COMPANYID。在定義全文索引時,是否可以
COMPANY
在樹中為每個索引提供自己的“分支”?
否是簡短的答案,你並不需要這個。全文索引是反向索引,因此它們通過創建全文索引時必須指定的唯一 doc_id 來儲存拆分詞。這必須是“唯一的、單鍵的、不可為空的列”,最好是整數。本質上什麼是外鍵並不重要,也沒有簡單的方法在此基礎上對它們進行分區。
您可以使用每個公司的表格和每個表格的全文索引來欺騙這樣的東西。您將需要某種程式碼邏輯來確定要插入/從中獲取的表。這將是一個相當令人頭疼的管理,幾乎可以肯定是不值得的。
如果您的數據量很大(例如,大約 230 億條記錄),那麼您可以查看分片解決方案,例如,每個公司都有一個 Azure VM,前面有一個應用程序來確定要連接到哪台機器。但顯然你也不需要那個。
SQL 2008 中也對全文進行了許多改進,現在更多地集成到數據庫引擎中。一種情況是,您針對普通列指定 WHERE 子句並使用全文函式,稱為“混合查詢”並在此處討論。即使該資訊適用於 SQL 2008,這仍然是一篇很棒的文章。
如果您通常關心性能和計劃,為什麼不旋轉一些測試數據,引入一些偏差並嘗試一下。我在幾分鐘內完成了大約 200 萬行的腳本:
!!TODO introduce some skew USE master GO SET NOCOUNT ON GO DBCC TRACEON(610) -- Minimal logging GO GO IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'fullTextDemo' ) BEGIN ALTER DATABASE fullTextDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE DROP DATABASE fullTextDemo END GO IF NOT EXISTS ( SELECT * FROM sys.databases WHERE name = 'fullTextDemo' ) CREATE DATABASE fullTextDemo GO ALTER DATABASE fullTextDemo SET RECOVERY SIMPLE GO USE fullTextDemo GO IF OBJECT_ID('dbo.yourAddresses') IS NOT NULL DROP TABLE dbo.yourAddresses IF OBJECT_ID('dbo.companies') IS NOT NULL DROP TABLE dbo.companies GO CREATE TABLE dbo.companies ( companyId INT IDENTITY NOT NULL, companyName NVARCHAR(50) NOT NULL, CONSTRAINT PK_companies PRIMARY KEY ( companyId ) ) GO CREATE TABLE dbo.yourAddresses ( rowId INT IDENTITY, companyId INT NOT NULL FOREIGN KEY REFERENCES dbo.companies ( companyId ), searchTerms NVARCHAR(2048) NOT NULL CONSTRAINT PK_yourAddresses PRIMARY KEY ( rowId ) ) GO -- Populate the companies ;WITH cte AS ( SELECT TOP 250 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn FROM master.sys.columns c1 CROSS JOIN master.sys.columns c2 CROSS JOIN master.sys.columns c3 ) INSERT INTO dbo.companies ( companyName ) SELECT NEWID() FROM cte GO -- Generate 2,636,000 records INSERT dbo.yourAddresses ( companyId, searchTerms ) SELECT c.companyId, m.[text] FROM dbo.companies c CROSS JOIN ( SELECT * FROM sys.messages ) m WHERE m.language_id = 1033 AND m.[text] Like '[a-z]%' GO CREATE INDEX _idx ON dbo.yourAddresses ( companyId ) INCLUDE ( searchTerms ) GO -- !!TODO look at compression --ALTER INDEX PK_yourAddresses ON dbo.yourAddresses REBUILD WITH ( DATA_COMPRESSION = PAGE ) --GO -- Create the catalog IF NOT EXISTS ( SELECT * FROM sys.fulltext_catalogs WHERE name = N'ftc_yourAddresses' ) CREATE FULLTEXT CATALOG ftc_yourAddresses GO -- Create the full-text index CREATE FULLTEXT INDEX ON dbo.yourAddresses ( searchTerms ) KEY INDEX PK_yourAddresses ON ftc_yourAddresses WITH CHANGE_TRACKING MANUAL -- CHANGE_TRACKING OFF, NO POPULATION GO SELECT 'before' ft, * FROM sys.fulltext_indexes GO ALTER FULLTEXT INDEX ON dbo.yourAddresses START FULL POPULATION; GO DECLARE @i INT SET @i = 0 WHILE EXISTS ( SELECT * FROM sys.fulltext_indexes WHERE has_crawl_completed = 0 ) BEGIN SELECT outstanding_batch_count, * FROM sys.dm_fts_index_population WHERE database_id = DB_ID() --SELECT * --FROM sys.dm_fts_outstanding_batches --WHERE database_id = DB_ID() WAITFOR DELAY '00:00:05' SET @i = @i + 1 IF @i > 60 BEGIN RAISERROR( 'Too many loops!', 16, 1 ) BREAK END END SELECT 'after' ft, * FROM sys.fulltext_indexes GO SELECT TOP 1000 * FROM dbo.yourAddresses ft WHERE companyId = 42 AND CONTAINS ( searchTerms, 'data' ) GO SELECT TOP 1000 * FROM dbo.yourAddresses a INNER JOIN CONTAINSTABLE ( dbo.yourAddresses, searchTerms, 'data' ) ct ON a.rowId = ct.[key] WHERE a.companyId = 42 GO SELECT TOP 1000 * FROM dbo.yourAddresses a INNER JOIN CONTAINSTABLE ( dbo.yourAddresses, searchTerms, 'data' ) ct ON a.rowId = ct.[key] WHERE a.companyId = 42 OPTION ( MERGE JOIN ) GO SELECT TOP 100 * FROM sys.dm_fts_index_keywords (DB_ID(), OBJECT_ID('dbo.yourAddresses') ) SELECT TOP 100 * FROM sys.dm_fts_index_keywords_by_document(DB_ID(), OBJECT_ID('dbo.yourAddresses') ) ORDER BY document_id GO