Sql-Server

在沒有函式的情況下以相同的順序拆分兩個分隔的字元串

  • November 23, 2020

我正在嘗試將帶有分隔字元串的兩列拆分為行。每個字元串中值的位置是相關的,因此我試圖將其拆分,以便相關值在一行中。我無法使用函式,因為我無法在數據庫中創建對象

這是範例表和數據

CREATE TABLE #temp
(id   INT,
keys VARCHAR(50),
vals VARCHAR(50)
);

INSERT INTO #temp
VALUES
(1, '1,2,3', 'one,two,three'),
(2, '4,5,6', 'four,five,six'),
(3, '7,8,9', 'seven,eight,nine');

我想要的輸出是

ID  key  val
1   1    one
1   2    two
1   3    three
2   4    four
2   5    five
2   6    six
3   7    seven
3   8    eight
3   9    nine

如果我只拆分一列,我的查詢就可以工作,所以我用 row_number 定義了兩個 CTE,並在 ID 和 row_number 上加入。這確實提供了所需的輸出,但我的實時表非常大,我希望有一種方法可以只通過表一次,而不是兩次。

with keys as(
SELECT id,keys,vals,
      keys.keyid.value('.', 'VARCHAR(8000)') AS keyid,
     row_number() over(order by (select null)) as rn
FROM
(SELECT id,keys,vals,
          CAST('<Keys><key>'+REPLACE(keys, ',', '</key><key>')+'</key></Keys>' AS XML) AS tempkeys
   FROM #temp
) AS temp
CROSS APPLY tempkeys.nodes('/Keys/key') AS keys(keyid)),
vals as(
SELECT id,keys,vals,
      vals.val.value('.', 'VARCHAR(8000)') AS valid,
     row_number() over(order by (select null)) as rn
FROM
(SELECT id,keys,vals,
          CAST('<vals><val>'+REPLACE(vals, ',', '</val><val>')+'</val></vals>' AS XML) AS tempvals
   FROM #temp
) AS temp
CROSS APPLY tempvals.nodes('/vals/val') AS vals(val))


SELECT k.id, k.keyid, v.valid
FROM keys AS k
    INNER JOIN vals AS v
    ON k.id = v.id
       AND k.rn = v.rn; 

msdb在其他地方或其他地方創建函式。

CREATE FUNCTION dbo.SplitTwoStringsWithSameOrder
(
   @List1  varchar(50),
   @List2  varchar(50),
   @Delim  varchar(10)
)
RETURNS TABLE
AS
   RETURN
   (
     WITH src(r) AS 
     (
       SELECT 1 UNION ALL SELECT r + 1 FROM src WHERE r < 10
     ),
     Numbers(Number) AS 
     (
       SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
       FROM src AS s1, src AS s2 -- add more if you need longer strings
     ),
     parsed(s1,s2,r1,r2)
     AS
     (
       SELECT
         SUBSTRING(@List1, n1.Number, CHARINDEX(@Delim, @List1 
           + @Delim, n1.Number) - n1.Number),
         SUBSTRING(@List2, n2.Number, CHARINDEX(@Delim, @List2 
           + @Delim, n2.Number) - n2.Number),
         r1 = ROW_NUMBER() OVER (ORDER BY n1.Number),
         r2 = ROW_NUMBER() OVER (ORDER BY n2.Number)
       FROM Numbers AS n1, Numbers AS n2
       ON  n1.Number <= LEN(@List1)
       AND n2.Number <= LEN(@List2)
       AND SUBSTRING(@Delim + @List1, n1.Number, LEN(@Delim)) = @Delim
       AND SUBSTRING(@Delim + @List2, n2.Number, LEN(@Delim)) = @Delim
     )
     SELECT s1, s2, r1, r2 FROM parsed WHERE r1 = r2
   );

然後,正如@gbn 所指出的,在您的查詢必須執行的任何地方通過 3 部分名稱引用它。

CREATE TABLE #temp
(id   INT,
keys VARCHAR(50),
vals VARCHAR(50)
);

INSERT INTO #temp
VALUES
(1, '1,2,3', 'one,two,three'),
(2, '4,5,6', 'four,five,six'),
(3, '7,8,9', 'seven,eight,nine');

SELECT t.id, f.s1, f.s2 FROM #temp AS t
 CROSS APPLY msdb.dbo.SplitTwoStringsWithSameOrder(keys, vals, ',') AS f
 ORDER BY t.id, f.r1;
GO

DROP TABLE #temp;

結果:

在此處輸入圖像描述

計劃資源管理器中顯示的最終計劃*(免責聲明:我是產品經理)*不是我見過的最漂亮的東西(點擊放大一點):

在此處輸入圖像描述

但是只有一次掃描#temp(4% 的成本)。最大的成本是兩種和一個線軸,並且由於工作台而存在一些 I/O,我不確定這是否可以避免。

如果您知道這些字元串中的任何一個都只有 50 個字元,那麼您可以使用內置Numbers表格獲得一個更簡單的計劃(人們反對這些,但它們非常有用,而且它們幾乎總是在記憶體中如果你足夠引用它們)。這對 I/O 沒有幫助,但刪除遞歸 CTE 和其他在函式內建構數字的結構對 CPU 等非常有幫助。

一、數字表:

DROP TABLE dbo.Numbers;

;WITH n AS
(
   SELECT
       TOP (50) rn = ROW_NUMBER() OVER
       (ORDER BY [object_id])
   FROM sys.all_columns 
   ORDER BY [object_id]
)
SELECT [Number] = rn - 1
INTO dbo.Numbers
FROM n;

CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);

然後是函式的第二個版本:

CREATE FUNCTION dbo.SplitTwoStringsWithSameOrder2
(
   @List1  varchar(50),
   @List2  varchar(50),
   @Delim  nvarchar(10)
)
RETURNS TABLE
AS
   RETURN
   (
     WITH parsed(s1,s2,r1,r2)
     AS
     (
       SELECT
         SUBSTRING(@List1, n1.Number, CHARINDEX(@Delim, @List1 
           + @Delim, n1.Number) - n1.Number),
         SUBSTRING(@List2, n2.Number, CHARINDEX(@Delim, @List2 
           + @Delim, n2.Number) - n2.Number),
         r1 = ROW_NUMBER() OVER (ORDER BY n1.Number),
         r2 = ROW_NUMBER() OVER (ORDER BY n2.Number)
       FROM dbo.Numbers AS n1
       INNER JOIN dbo.Numbers AS n2
       ON  n1.Number <= LEN(@List1)
       AND n2.Number <= LEN(@List2)
       AND SUBSTRING(@Delim + @List1, n1.Number, LEN(@Delim)) = @Delim
       AND SUBSTRING(@Delim + @List2, n2.Number, LEN(@Delim)) = @Delim
     )
     SELECT s1, s2, r1, r2 FROM parsed WHERE r1 = r2
   );
GO

這是產生的更簡單的計劃(再次點擊放大):

在此處輸入圖像描述

該計劃仍然有兩個排序操作,但線軸已經消失,仍然只有一次掃描#temp,並且在我的有限測試中,成本數字(絕對成本數字,而不是 %)每次都更好。

我不確切知道其中任何一個都會擴展更多行,但值得測試,如果您將其與其他解決方案進行權衡並且它不能很好地擴展,那麼您可能需要重新考慮設計(儲存這些關係而不是逗號分隔的集合)。

引用自:https://dba.stackexchange.com/questions/195010