Ssis

如何設置派生列轉換以獲取 SSIS 2014 中的行號?

  • April 24, 2017

我正在嘗試導入一個參差不齊的右平面文件。我想將文件作為一列導入,並使用派生列轉換為每一行添加一個行號。我想最終得到一個名為 RowNum 的列和一個名為 EntireRow 的列。我該怎麼做呢?

派生列將無法添加行號。好吧,它可以添加一個名為 RowNum 的列,但表達式語言不支持動態更新值的能力。相反,您需要一個腳本組件。

您可以搜尋“SSIS 生成代理鍵”並找到大量參考實現。我要藉用 Joost 的文章Create a Row Id for this answer

// C# code: surrogate key script
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
   // New internal variable to store the rownumber
   private int rowCounter = 0;

   // Method that will be started for each record in you dataflow  
   public override void Input0_ProcessInputRow(Input0Buffer Row)
   {
       // Seed counter
       rowCounter++;

       // Fill the new column
       Row.RowNum = rowCounter;
   }
}

對於那些在 2005 年的人來說,這種方法看起來像

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute> _
<CLSCompliant(False)> _
Public Class ScriptMain
   Inherits UserComponent

   ' New internal variable to store the rownumber
   Private rowCounter As Integer = 0

   ' Method that will be started for each record in you dataflow   
   Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
       'Seed counter
       rowCounter = rowCounter + 1

       ' Fill the new column
       Row.RowNum = rowCounter
   End Sub
End Class

由於我喜歡提供基於 Biml 的答案,因此您可以使用以下程式碼,同樣來自 Joost Creating BIML Script Component Transformation (rownumber)

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Annotations>
 <Annotation>
  File: Script Component Transformation RowNumber.biml
  Description: Example of using the Script Component as
  a transformation to add a rownumber to the destination.
  Note: Example has an OLE DB Destination that supports
  an identity column. Use your own Flat File, Excel or
  PDW destination that doesn't supports an identity.
  VS2012 BIDS Helper 1.6.6.0
  By Joost van Rossum http://microsoft-ssis.blogspot.com
 </Annotation>
</Annotations>

<!--Package connection managers-->
   <Connections>
           <OleDbConnection
               Name="Source"
               ConnectionString="Data Source=.;Initial Catalog=ssisjoostS;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;">
           </OleDbConnection>
           <OleDbConnection
               Name="Destination"
               ConnectionString="Data Source=.;Initial Catalog=ssisjoostD;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;">
           </OleDbConnection>
      </Connections>

      <ScriptProjects>
            <ScriptComponentProject ProjectCoreName="sc_c253bef215bf4d6b85dbe3919c35c167.csproj" Name="SCR - Rownumber">
                   <AssemblyReferences>
                          <AssemblyReference AssemblyPath="Microsoft.SqlServer.DTSPipelineWrap" />
                          <AssemblyReference AssemblyPath="Microsoft.SqlServer.DTSRuntimeWrap" />
                          <AssemblyReference AssemblyPath="Microsoft.SqlServer.PipelineHost" />
                          <AssemblyReference AssemblyPath="Microsoft.SqlServer.TxScript" />
                          <AssemblyReference AssemblyPath="System.dll" />
                          <AssemblyReference AssemblyPath="System.AddIn.dll" />
                          <AssemblyReference AssemblyPath="System.Data.dll" />
                          <AssemblyReference AssemblyPath="System.Xml.dll" />
                   </AssemblyReferences>
                   <ReadOnlyVariables>
                          <Variable VariableName="maxrownumber" Namespace="User" DataType="Int32"></Variable>
                   </ReadOnlyVariables>
                   <Files>
      <!-- Left alignment of .Net script to get a neat layout in package-->
                          <File Path="AssemblyInfo.cs">
using System.Reflection;
using System.Runtime.CompilerServices;

//
// General Information about an assembly is controlled through the following 
// set of attributes. Change these attribute values to modify the information
// associated with an assembly.
//
[assembly: AssemblyTitle("SC_977e21e288ea4faaaa4e6b2ad2cd125d")]
[assembly: AssemblyDescription("")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("SSISJoost")]
[assembly: AssemblyProduct("SC_977e21e288ea4faaaa4e6b2ad2cd125d")]
[assembly: AssemblyCopyright("Copyright @ SSISJoost 2015")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]
//
// Version information for an assembly consists of the following four values:
//
//      Major Version
//      Minor Version 
//      Build Number
//      Revision
//
// You can specify all the values or you can default the Revision and Build Numbers 
// by using the '*' as shown below:

[assembly: AssemblyVersion("1.0.*")]
                          </File>
      <!-- Replaced greater/less than by > and < -->
                          <File Path="main.cs">#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
#endregion

/// <summary>
/// Rownumber transformation to create an identity column
/// </summary>
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
int rownumber = 0;

/// <summary>
/// Get max rownumber from variable
/// </summary>
public override void PreExecute()
{
 rownumber = this.Variables.maxrownumber;
}

/// <summary>
/// Increase rownumber and fill rownumber column
/// </summary>
/// <param name="Row">The row that is currently passing through the component</param>
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
 rownumber++;
 Row.rownumber = rownumber;
}
}
                          </File>
                   </Files>
                   <InputBuffer Name="Input0">
                          <Columns>
                          </Columns>
                   </InputBuffer>
                   <OutputBuffers>
                          <OutputBuffer Name="Output0">
                                 <Columns>
                                       <Column Name="rownumber" DataType="Int32"></Column>
                                 </Columns> 
                          </OutputBuffer>
                   </OutputBuffers>
            </ScriptComponentProject>
      </ScriptProjects>

      <Packages>
            <!--A query to get all tables from a certain database and loop through that collection-->
            <# string sConn = @"Provider=SQLNCLI11.1;Server=.;Initial Catalog=ssisjoostS;Integrated Security=SSPI;";#>
            <# string sSQL  = "SELECT name as TableName FROM dbo.sysobjects where xtype = 'U' and category = 0 ORDER BY name";#>
            <# DataTable tblAllTables = ExternalDataAccess.GetDataTable(sConn,sSQL);#>
            <# foreach (DataRow row in tblAllTables.Rows) { #>

            <!--Create a package for each table and use the tablename in the packagename-->
            <Package ProtectionLevel="DontSaveSensitive" ConstraintMode="Parallel" AutoCreateConfigurationsType="None" Name="ssisjoost_<#=row["TableName"]#>"> 
                   <Variables>
                          <Variable Name="maxrownumber" DataType="Int32">0</Variable>
                   </Variables>

                   <!--The tasks of my control flow: get max rownumber and a data flow task-->
                   <Tasks>
                   <!--Execute SQL Task to get max rownumber from destination-->
                   <ExecuteSQL
                          Name="SQL - Get max rownumber <#=row["TableName"]#>"
                          ConnectionName="Destination"
                          ResultSet="SingleRow">
                          <DirectInput>SELECT ISNULL(max([rownumber]),0) as maxrownumber FROM  <#=row["TableName"]#></DirectInput>
                          <Results> 
                          <Result Name="0" VariableName="User.maxrownumber" /> 
                          </Results> 
                   </ExecuteSQL>

                   <!--Data Flow Task to fill the destination table-->
                   <Dataflow Name="DFT - Process <#=row["TableName"]#>">
                   <!--Connect it to the preceding Execute SQL Task-->
                   <PrecedenceConstraints>
                          <Inputs>
                                 <Input OutputPathName="SQL - Get max rownumber <#=row["TableName"]#>.Output"></Input>
                          </Inputs>
                   </PrecedenceConstraints>

                   <Transformations>
                   <!--My source with dynamic, but ugly * which could be replace by some .NET/SQL code retrieving the columnnames-->
                   <OleDbSource Name="OLE_SRC - <#=row["TableName"]#>" ConnectionName="Source">
                          <DirectInput>SELECT * FROM <#=row["TableName"]#></DirectInput>
                   </OleDbSource>

                   <ScriptComponentTransformation Name="SCR - Rownumber">
                          <ScriptComponentProjectReference ScriptComponentProjectName="SCR - Rownumber" />
                   </ScriptComponentTransformation>

                   <!--My destination with no column mapping because all source columns exist in destination table-->                       
                   <OleDbDestination Name="OLE_DST - <#=row["TableName"]#>" ConnectionName="Destination">
                          <ExternalTableOutput Table="<#=row["TableName"]#>"></ExternalTableOutput>
                   </OleDbDestination>
                   </Transformations>
                   </Dataflow>
                   </Tasks>
            </Package>
            <# } #>
      </Packages>
      </Biml>

<!--Includes/Imports for C#-->
<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data"#>
<#@ import namespace="System.Data.SqlClient"#>

我還提到了一種更簡單的方法,即在目標表中定義一個新列,例如: RowId as INT IDENTITY(1,1)

引用自:https://dba.stackexchange.com/questions/87196