Sql-Server
使用不同的區分大小寫排序規則時的 SQL Server PATINDEX 問題/錯誤
我有一個函式(我幾年前在這裡找到的),它使用
STUFF
/PATINDEX
從字元串中去除非字母數字字元。在不區分大小寫的排序規則上執行時,它工作正常。最近我需要在區分大小寫的排序數據庫上使用它,並發現了一些奇怪的行為。如果**%pattern%** forPATINDEX
只是指定小寫(例如:%$$ ^a-z0-9_- $$% ),然後當排序規則為****Latin1_General_100_CS_AS時,將刪除所有大寫“Z” 。如果排序規則是SQL_Latin1_General_CP1_CS_AS,則刪除大寫“A”。這是一個錯誤還是我錯過了什麼?USE TestCollation GO PRINT '---------SourceString---------' PRINT 'ABCDEFGHIJKLMNO_Z_A_PQRSTUVWXYZ-abcdefghijklmnopqrstuvwxyz' GO ALTER DATABASE TestCollation COLLATE SQL_Latin1_General_CP1_CI_AS; GO PRINT '---------SQL_Latin1_General_CP1_CI_AS---------' GO DECLARE @ExternalId VARCHAR(255) = 'ABCDEFGHIJKLMNO_Z_A_PQRSTUVWXYZ-abcdefghijklmnopqrstuvwxyz' DECLARE @return VARCHAR(255) SET @return = @ExternalId DECLARE @KeepValues AS VARCHAR(50) SET @KeepValues = '%[^a-z0-9_-]%' WHILE PATINDEX ( @KeepValues, @return ) > 0 BEGIN SET @return = STUFF ( @return, PATINDEX ( @KeepValues, @return ), 1, '' ) END PRINT @return go ALTER DATABASE TestCollation COLLATE Latin1_General_100_CS_AS; GO PRINT '---------Latin1_General_100_CS_AS---------' GO DECLARE @ExternalId VARCHAR(255) = 'ABCDEFGHIJKLMNO_Z_A_PQRSTUVWXYZ-abcdefghijklmnopqrstuvwxyz' DECLARE @return VARCHAR(255) SET @return = @ExternalId DECLARE @KeepValues AS VARCHAR(50) SET @KeepValues = '%[^a-z0-9_-]%' WHILE PATINDEX ( @KeepValues, @return ) > 0 BEGIN SET @return = STUFF ( @return, PATINDEX ( @KeepValues, @return ), 1, '' ) END PRINT @return ALTER DATABASE TestCollation COLLATE SQL_Latin1_General_CP1_CS_AS; GO PRINT '---------SQL_Latin1_General_CP1_CS_AS---------' GO DECLARE @ExternalId VARCHAR(255) = 'ABCDEFGHIJKLMNO_Z_A_PQRSTUVWXYZ-abcdefghijklmnopqrstuvwxyz' DECLARE @return VARCHAR(255) SET @return = @ExternalId DECLARE @KeepValues AS VARCHAR(50) SET @KeepValues = '%[^a-z0-9_-]%' WHILE PATINDEX ( @KeepValues, @return ) > 0 BEGIN SET @return = STUFF ( @return, PATINDEX ( @KeepValues, @return ), 1, '' ) END PRINT @return
這不是錯誤。在進行區分大小寫的排序時,這只是大小寫優先的區別。雖然看起來似乎沒有進行任何排序,但
[...]
兩者中使用的字元範圍萬用字元LIKE
和PATINDEX
在某種意義上確實在應用範圍時對字元進行排序,例如任何{character}-{character}
模式(在本例中為a-z
and0-9
)。所以,這兩個選項是:
- A a BbCc…Z z ( az不包括A )
- a AbBcC… z Z ( az不包括Z )
SQL Server 排序規則(即名稱以 開頭的排序規則
SQL_
)大多使用一種方法,而 Windows 排序規則(即名稱不以 開頭的排序規則SQL_
)使用另一種方法。為了說明行為:
SELECT * FROM (VALUES ('A'), ('a'), ('Z'), ('z')) tmp (val) WHERE tmp.val LIKE '%[a-z]%' COLLATE SQL_Latin1_General_CP1_CS_AS ORDER BY tmp.val COLLATE SQL_Latin1_General_CP1_CS_AS; /* a Z z */ SELECT * FROM (VALUES ('A'), ('a'), ('Z'), ('z')) tmp (val) WHERE tmp.val LIKE '%[a-z]%' COLLATE Latin1_General_100_CS_AS ORDER BY tmp.val COLLATE Latin1_General_100_CS_AS; /* a A z */ SELECT * FROM (VALUES ('A'), ('a'), ('Z'), ('z')) tmp (val) WHERE tmp.val LIKE '%[a-z]%' COLLATE Latin1_General_100_BIN2 ORDER BY tmp.val COLLATE Latin1_General_100_BIN2; /* a z */
僅供參考:您可以通過在兩個呼叫中強制排序規則來避免處理數據庫的預設排序規則
PATINDEX
:GO DECLARE @ExternalId VARCHAR(255) = 'ABCDEFGHIJKLMNO_Z_A_PQRSTUVWXYZ-abcdefghijklmnopqrstuvwxyz' DECLARE @return VARCHAR(255) SET @return = @ExternalId DECLARE @KeepValues AS VARCHAR(50) SET @KeepValues = '%[^a-z0-9_-]%' WHILE PATINDEX ( @KeepValues COLLATE Latin1_General_100_CI_AS, @return ) > 0 BEGIN SET @return = STUFF ( @return, PATINDEX ( @KeepValues COLLATE Latin1_General_100_CI_AS, @return ), 1, '' ) END PRINT @return GO
或者,添加
A-Z
到模式中:SET @KeepValues = '%[^a-zA-Z0-9_-]%'