This post is the 3rd and last in a series of 3 planned installments about using Excel data files with Microsoft SQL Server Integration Services.
- Connections and components
- Tables and data types
- Issues and alternatives
Introduction
Don't expect miracles. If you've been following this series, I hope that anticipation - and frustration with the Excel driver - have not led you to expect miracles. I do not have secret insider knowledge, and frankly, some of the issues with the Excel driver simply don't have a solution or workaround that will satisfy everyone.
The good news is, all the issues with the Excel driver are not only known but well-known issues, thanks to its long life and widespread use. However they "present", as a doctor would say, with different symptoms depending on the environment - ADO.NET code, DTS or SSIS packages or other clients. But it's always the same handful of underlying and recurring issues.
And yes, you will find this information all over the Web, because the questions have been asked many times. While I hope to provide a useful summary, I make no claim to new or original information.
The Root of All Evil
The root of all of our problems, of course, is that Excel is not a real database with fixed column metadata. In Column A, I can have a string in Row 1, and a currency value in Row 2, and a date in Row 3. So the driver has to guess at data types, to begin with (which it does by reading a sample of rows), and it only recognizes 6 basic data types.
The Most Common Issues
The principal known issues that affect your data when working with the Excel driver are summarized in the BOL topics on the Excel Source and the Excel Destination. However you have to scroll down to find the sections that are euphemistically titled "Usage Considerations" - to maintain consistency with other source and destination topics, this non-standard section had to go at the bottom.
Most common import issues (see Excel Source for more information on importing)
- Null values. If you've got numbers and strings in the same column, the driver democratically picks the majority type and returns nulls for the rest. Resolved by using Import Mode (IMEX=1) to import everything as a string. Here is the text from BOL:
"Missing values. The Excel driver reads a certain number of rows (by default, 8 rows) in the specified source to guess at the data type of each column. When a column appears to contain mixed data types, especially numeric data mixed with text data, the driver decides in favor of the majority data type, and returns null values for cells that contain data of the other type. (In a tie, the numeric type wins.) Most cell formatting options in the Excel worksheet do not seem to affect this data type determination. You can modify this behavior of the Excel driver by specifying Import Mode. To specify Import Mode, add IMEX=1 to the value of Extended Properties in the connection string of the Excel connection manager in the Properties window. For more information, see PRB: Excel Values Returned as NULL Using DAO OpenRecordset."
- Truncated strings. If you've got <255 and >255 strings, you have to have at least 1 >255 string in the rows that the driver samples, or it interprets the column as <255 strings and truncates longer ones. Resolved by including a >255 string in the sampled rows. Here is the text from BOL:
"Truncated text. When the driver determines that an Excel column contains text data, the driver selects the data type (string or memo) based on the longest value that it samples. If the driver does not discover any values longer than 255 characters in the rows that it samples, it treats the column as a 255-character string column instead of a memo column. Therefore, values longer than 255 characters may be truncated. To import data from a memo column without truncation, you must make sure that the memo column in at least one of the sampled rows contains a value longer than 255 characters, or you must increase the number of rows sampled by the driver to include such a row. You can increase the number of rows sampled by increasing the value of TypeGuessRows under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry key. For more information, see PRB: Transfer of Data from Jet 4.0 OLEDB Source Fails w/ Error."
Most common export issues (see Excel Destination for more information on exporting)
I do not know of solutions for avoiding either of these export issues . It seems that they have to be handled in Excel after the export.
- Single quote before text data. To ensure that string data will be treated as strings in Excel, the driver does us the favor of inserting a single quote before each string value. (This is the old Lotus 1-2-3 convention for strings that old-timers will remember.) This quote character is not visible when you look at the worksheet, but may cause problems as you work with the data.
- Exported numbers appear in scientific (exponential) notation. I've only heard about this problem in recent years, and I'm not sure what changed in the driver or in Excel to cause it. I had trouble reproducing the issue. However in some cases it leads to loss of precision in numeric values.
A Quick Test
Is your import from Excel going to work? Give it a quick test by using the SQL Server Import and Export Wizard! Go as far as the Select Source Tables and Views page so that you can click on the Edit Mappings button to view the data types (and, for string columns, the sizes) that the driver has identified for your columns. Don't trust the Preview window alone without also looking at the data types and sizes on the Edit Mappings page!
Consider testing with IMEX=1 and without it. How do you specify IMEX in the Wizard, since we've done you the favor of explicitly hiding the Jet provider in the list? (The same favor that prevents you from importing DBF files!) Pick the Office 12 provider instead. You have to type or paste in the Excel file path - there's no Browse dialog - but on the All page, you can add "IMEX=1" along with the Excel version for Extended Properties.
Alternatives to Using the Excel Driver
Automate the Excel Application
"Okay then, if the driver won't do what I want, I'll just write some code to automate Excel from my package." This is naughty - automation of Office client apps from server apps that often run unattended is asking for unpleasant surprises (and is unsupported). For more information, see Considerations for server-side Automation of Office.
Use a Third-party Component
You can save data to Excel without using either the driver or Excel itself by calling the API of SoftArtisans OfficeWriter product, as described in Creating an Excel Batch-Reporting Solution With Integration Services and OfficeWriter.
Write Excel XML
Get creative and be the first on your block to write out Excel XML rather than binary files. Sure, it takes a lot of lines of rather tedious code...but it avoids the driver! There's an entire MSDN Developer Center for Office XML, but these may be the best articles at this writing for getting started with Excel XML:
See also
- Excel Driver Issues and Tips, a file available for download on this site. This is a list of Microsoft Knowledge Base articles related to Excel driver capabilities and issues in various client and server applications. While none of the KB articles in this [slightly out-of-date] list are specific to SSIS, you may find them useful in other projects, or for the background information that they provide.
This is our final installment in this series.
Thank you for following along. Please don't hesitate to comment and contribute! This blog allows anonymous comments, although I moderate comments to guard against anything inappropriate.
-Doug
Posted
Jun 21 2008, 01:05 PM
by
dougbert