Excel in Integration Services, Part 2 of 3: Tables and Data Types
This post is the 2nd in a series of 3 planned installments about using Excel data files with Microsoft SQL Server Integration Services.
- Connections and components
- Tables and data types
- Issues and alternatives
Tables
What's a table in Excel?
The source of data in an Excel workbook can be:
- A worksheet (for example, Sheet1$), which you can recognize in a list because it has the $ appended. In a SQL statement, the name of a worksheet must be delimited (for example, [Sheet1$]) to avoid a syntax error caused by the $ sign. If you use the Query Builder, it automatically adds these delimiters.
- A named range (for example, MyRange), which you can recognize in a list because it does not have the $ appended.
- An unnamed range specified by using cell coordinates (for example, Sheet1$A1:C100), which you normally have to specify in a SELECT statement.
Both worksheets and named ranges appear in lists of "tables" in an Excel data source.
Working with Excel tables
When you use the Excel driver to CREATE the table, as you might do for an Excel destination in the SQL Server Import and Export Wizard, the driver creates both a worksheet (MyOutput$) and a named range (MyOutput) with the name that you supply.
Whether you specify a worksheet or a range, the driver reads the contiguous block of cells starting with the first non-empty cell in the upper-left corner of the worksheet or range. Therefore you cannot have empty rows in the source data, or an empty row between the header row and the data rows. Similarly, if you use cell A1 for a worksheet title, you may have to assign a name to your range of data farther down, and use the named range instead of the worksheet name.
When you specify a range, you get an error if the range is not wide enough; that is, if it has fewer columns than you need. However, if the range that you've defined isn't long enough - that is, if it has fewer rows - the driver is intelligent enough not just to continue writing rows, but to extend the range definition as well.
Deleting and reusing
If you try to delete Excel data or tables through the Excel driver (for example, in an Execute SQL task), you'll probably get results that you don't expect. In my most recent testing:
- A DELETE statement fails with an error.
- A DROP TABLE statement, with either a named range or a worksheet, "succeeds" and, ironically, deletes the data, but does not delete the range or the worksheet.
It's not easy to re-use a worksheet that you used previously as a destination. Clearing the cell values or rows in Excel that were loaded by the driver does not make those rows "reusable"; the driver still treats the rows as if they contained data. Deleting the physical rows in Excel solves this problem, but if you're using a named range, of course that also shrinks or deletes your range.
If you want to "re-use" a worksheet in a particular format - for example, for a daily export to Excel - the cleanest solution is to create a blank workbook to save as a template, and to use a File System task to copy that workbook into the working directory when needed. Leave the template intact.
Data types
The Excel driver recognizes only a limited set of data types. For example, all numeric columns are interpreted as doubles (DT_R8), and all string columns (other than memo columns) are interpreted as 255-character Unicode strings (DT_WSTR).
Integration Services maps the Excel data types as follows:
- Numeric – double-precision float (DT_R8)
- Currency – currency (DT_CY)
- Boolean – Boolean (DT_BOOL)
- Date/time – datetime (DT_DATE)
- String – Unicode string, length 255 (DT_WSTR)
- Memo – Unicode text stream (DT_NTEXT)
"Hey, columns don't have data types in Excel!" No, and this is the root of the most common issues when using the Excel driver. So the driver assigns data types by sampling several rows of data (typically 8, by default) and picking the data type the represents the majority of the values in each column. In other words, it guesses. If Column B contains 5 strings and 3 numbers, then it's a string column. In a tie (for example, 4 strings and 4 numbers), numeric wins...this is a spreadsheet, after all! (Yes, I bothered to test this.) You can increase the number of rows sampled by increasing the value of TypeGuessRows under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry key.
"I wish I knew where to find more information about the data types." Well, if you have Microsoft Office, it's hidden on your hard drive in the file JETSQL40.CHM. I have a copy of this Help file in both C:\Program Files\Common Files\microsoft shared\OFFICE12\1033 (from Office 2007) and C:\Program Files\Common Files\microsoft shared\OFFICE11\1033 (from Office 2003).
Data type and length conversions
Integration Services does not implicitly convert data types. As a result, you may need to use Data Conversion or Derived Column transformations to convert Excel data explicitly before loading it into a non-Excel destination, or to convert non-Excel data before loading it into an Excel destination. In this case, it may be useful to create the initial package by using the Import and Export Wizard, which warns you about data type differences (in SQL Server 2008) and configures the necessary conversions for you. Some examples of the conversions that may be required include the following:
- Conversion between Unicode Excel string columns and non-Unicode string columns with specific codepages
- Conversion between 255-character Excel string columns and string columns of different lengths
- Conversion between double-precision Excel numeric columns and numeric columns of other types
See also
Although these Books Online topics were also listed in our previous installment on Connections and components, they both discuss how to work with Excel tables, as well as Excel files.
In our next installment...
In Issues and alternatives, we'll take a look at the most common issues when importing from, or exporting to, Excel.
- Doug