Multiple Tabs in an Excel Source

August 21, 2009

Here is a bit of SSIS design that some users may find helpfull. Suppose you have an Excel file with multiple tabs, all with the same data. Or, you get a new monthly file each month, but the file name and/or the Tab name is different each time.

Using a ForEach loop, you can loop over all the (Excel) files in your folder, and using the write up in my post about looping over flat files, you can get to each one programatically inside SSIS. This will take care of the situation where the file name changes each month; your loop will execute only once.

But what if you a) have multiple tabs in your Excel file, or b) don’t know in advance what the name(s) of the tab(s) will be. You need some way to examine the Excel file in advance of the Data Flow and get the tab name(s).

I’m going to simplify this scenario and assume we are working with only one file, with multiple tabs. If you have multiple files to work with, simply encase everything here in a For Each loop configured to handle the multiple files.

Start by creating a Variable named TabList, of type Object. This will hold the list of tab names. Add a Data Flow, and in it drag in a Script Task. When prompted, set it as a Source. On the Inputs and Outputs tab, click on the Output 0 and add a Column called TabName, of type string[ST_STR]. In the script itself enter the following code under the CreateNewOutputRows sub:

Dim excelFile As String = “Data Source=” + Me.Variables.FullFileName.ToString +  “;Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties=Excel 8.0;”      

It’s fairly simple code: Open an Excel File as an OLE DB Connection then use the GetSchema(“TABLES”) method to get the list of tables in the file. Finally, send each row of data (containing a tab name) to the Source component’s Output buffer.
Note that I have a bit of logical testing for the currentTable variable. In Excel, tab names end with a ‘$’ as in Sheet1$. And you may also get a tab name of “Sheet1$FilterDatabase”, referencing the exact same range of data. But since we only want one copy of the data, we check to see if the tab name ends in ‘$’ and if it does, use it.
But if your Tab Name contains spaces, then the whole name will be enclosed in single quotes, including the $ delinitor, as in ” ‘My Tab Name with spaces$’ “. (Sort of like SQL Server and encasing object names with [  ] . So that means the $ is NOT the last character of the tab name, but the second to last.
Moving on, now that we have a Source for the Tab Names on the Data Flow, send it to a RecordSet destination. On the Component Properties tab, specify a the User::TabList variable to hold the data. Then on the Input Columns tab, select the TabName column.
Back on your Control Flow, downstream of the Data Flow, add a For Each loop. Set it for Foreach ADO Enumerator on the Colletion tab, and select User::TabList as the ADO Source Object. Leave the selection below it as Rows in the first table.
On the Variable Mappings tab, create a variable User::TabName, type string to hold the Tab Name.
Inside the loop, add your Data Flow task. On it, in the Source connection, set the Data Access Mode as “Table name or view name variable”, then in the Variable name, select User::TabName.














Dim cn As OleDbConnection = New OleDbConnection(excelFile)      
Dim tablesInFile As DataTable = cn.GetSchema(“TABLES”)      
Dim tableCount As Integer = tablesInFile.Rows.Count     
Dim currentTable As String
Dim columnRestrictions(3) As String  

For Each tableInFile As DataRow In tablesInFile.Rows        
 currentTable = tableInFile(“TABLE_NAME”)    
 If Len(currentTable) = InStr(currentTable, “$”) Or _                      
           Len(currentTable) = InStr(currentTable, “$”) + 1 Then
                 Output0Buffer.TabName = currentTable

            End If


 I’ll leave it to you to edit the first line of code to get the proper Connection String for your particular case.