Sql-Server

查詢以規範化表/組合行文本

  • July 9, 2011

我有一個表(稱為 oldTable),其列如下:

ID (int),Rank (int),TextLineNumber (int),SomeText (varchar)

主鍵是多部分的:ID+Rank+TextLineNumber。

我正在嘗試將其轉換/加入另一個表(稱為 newTable),其中包含如下列:

ID (int)、Rank (int)、CombinedText (varchar)

主鍵是 ID+Rank。

新表上的 ID 和 Rank 已填充,但我需要一個查詢來更新 newTable 的 CombinedText 列,並考慮以下注意事項:

  1. 新表上給出的 Rank 可能在舊表上不存在,在這種情況下,它需要從舊表中選擇不大於新表上的 rank 的最高可用 rank。
  2. CombinedText 列是舊表中“SomeText”列的字元串連接,使用從第一次考慮中找到的 Rank 以“TextLineNumber”的順序連接。

以下是一些範例數據:

舊 - http://i54.tinypic.com/jq0vmx.png

新 - http://i53.tinypic.com/dhfyn8.png

如果這很重要,我正在使用 MSSql 2005。我目前使用 T-SQL 和 while 循環來執行此操作,但它已成為嚴重的性能瓶頸(10000 行大約需要 1 分鐘)。

編輯:CSV 中的擴展範例數據:

舊:

ID,Rank,LineNumber,SomeText
1,1,1,the qu  
1,1,2,ick br  
1,1,3,own  
1,2,1,some te  
1,2,2,xt  
1,3,1,sample  
2,7,1,jumped ov  
2,7,2,er the  
2,7,3,lazy  
2,13,1,samp  
2,13,2,le text  
3,1,1,ABC  
3,1,2,DEF  
3,1,3,GHI  
3,1,4,JKL  
3,50,1,XYZ

新的:

ID,Rank,CombinedText
1,2,some text
2,13,sample text
2,14,sample text
3,4,ABCDEFGHIJKL
3,5,ABCDEFGHIJKL
3,50,XYZ
3,55,XYZ

edit2:

這是一個範例查詢,我發現它確實有效但速度不夠快(依賴於多個子查詢):

update newtable set combinedtext = 
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=1),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=2),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=3),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=4),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=5),'')

它還假設最大行數為 5,但情況可能並非如此。如果需要的話,我不介意將行號硬編碼到最多 20 個,但理想情況下,它能夠以不同的方式解釋它們。將執行時間控制在 20 秒以下(實際數據)是目標……

這應該可以,我稍後會清理它,這樣它會更有效率。

DECLARE @Old TABLE ( 
 id         INT, 
 rank       INT, 
 linenumber INT, 
 sometext   VARCHAR(1000)) 
DECLARE @New TABLE ( 
 id           INT, 
 rank         INT, 
 combinedtext VARCHAR(1000)) 


;WITH combinedresults(ctid, id, rank, linenumber, combinedtext) 
    AS (SELECT 0, 
               id, 
               rank, 
               linenumber, 
               CAST (sometext AS VARCHAR(8000)) 
        FROM   @Old o 
        WHERE  NOT EXISTS (SELECT TOP 1 1 
                           FROM   @Old 
                           WHERE  id = o.id 
                                  AND rank = o.rank 
                                  AND linenumber < o.linenumber) 
        UNION ALL 
        SELECT ctid + 1, 
               o.id, 
               o.rank, 
               o.linenumber, 
               ct.combinedtext + o.sometext 
        FROM   @Old o 
               INNER JOIN combinedresults ct 
                 ON ct.id = o.id 
                    AND ct.rank = o.rank 
        WHERE  o.linenumber > ct.linenumber) 

UPDATE n 
SET    combinedtext = ct.combinedtext 
FROM   @New n 
      INNER JOIN (SELECT n.id, 
                         n.rank, 
                         MAX(o.rank) orank 
                  FROM   @new n 
                         INNER JOIN @Old o 
                           ON n.id = o.id 
                              AND o.rank <= n.rank 
                  GROUP  BY n.id, 
                            n.rank) r 
        ON n.id = r.id 
           AND n.rank = r.rank 
      INNER JOIN (SELECT id, 
                         ct.rank, 
                         MAX(ctid) ctid 
                  FROM   combinedresults ct 
                  GROUP  BY ct.id, 
                            ct.rank) r2 
        ON r2.id = r.id 
           AND r2.rank = r.orank 
      INNER JOIN combinedresults ct 
        ON r.id = ct.id 
           AND ct.rank = r.orank 
           AND ct.ctid = r2.ctid 

SELECT * 
FROM   @New 

引用自:https://dba.stackexchange.com/questions/2193