Jun
21
2008

Excel in Integration Services, Part 3 of 3: Issues and Alternatives

This post is the 3rd and last in a series of 3 planned installments about using Excel data files with Microsoft SQL Server Integration Services.

  1. Connections and components
  2. Tables and data types
  3. Issues and alternatives

Introduction

Don't expect miracles. If you've been following this series, I hope that anticipation - and frustration with the Excel driver - have not led you to expect miracles. I do not have secret insider knowledge, and frankly, some of the issues with the Excel driver simply don't have a solution or workaround that will satisfy everyone.

The good news is, all the issues with the Excel driver are not only known but well-known issues, thanks to its long life and widespread use. However they "present", as a doctor would say, with different symptoms depending on the environment - ADO.NET code, DTS or SSIS packages or other clients. But it's always the same handful of underlying and recurring issues.

And yes, you will find this information all over the Web, because the questions have been asked many times. While I hope to provide a useful summary, I make no claim to new or original information.

The Root of All Evil

The root of all of our problems, of course, is that Excel is not a real database with fixed column metadata. In Column A, I can have a string in Row 1, and a currency value in Row 2, and a date in Row 3. So the driver has to guess at data types, to begin with (which it does by reading a sample of rows), and it only recognizes 6 basic data types.

The Most Common Issues

The principal known issues that affect your data when working with the Excel driver are summarized in the BOL topics on the Excel Source and the Excel Destination. However you have to scroll down to find the sections that are euphemistically titled "Usage Considerations" - to maintain consistency with other source and destination topics, this non-standard section had to go at the bottom.

Most common import issues (see Excel Source for more information on importing)

  • Null values. If you've got numbers and strings in the same column, the driver democratically picks the majority type and returns nulls for the rest. Resolved by using Import Mode (IMEX=1) to import everything as a string. Here is the text from BOL:
    "Missing values. The Excel driver reads a certain number of rows (by default, 8 rows) in the specified source to guess at the data type of each column. When a column appears to contain mixed data types, especially numeric data mixed with text data, the driver decides in favor of the majority data type, and returns null values for cells that contain data of the other type. (In a tie, the numeric type wins.) Most cell formatting options in the Excel worksheet do not seem to affect this data type determination. You can modify this behavior of the Excel driver by specifying Import Mode. To specify Import Mode, add IMEX=1 to the value of Extended Properties in the connection string of the Excel connection manager in the Properties window. For more information, see PRB: Excel Values Returned as NULL Using DAO OpenRecordset."
  • Truncated strings. If you've got <255 and >255 strings, you have to have at least 1 >255 string in the rows that the driver samples, or it interprets the column as <255 strings and truncates longer ones. Resolved by including a >255 string in the sampled rows. Here is the text from BOL:
    "Truncated text. When the driver determines that an Excel column contains text data, the driver selects the data type (string or memo) based on the longest value that it samples. If the driver does not discover any values longer than 255 characters in the rows that it samples, it treats the column as a 255-character string column instead of a memo column. Therefore, values longer than 255 characters may be truncated. To import data from a memo column without truncation, you must make sure that the memo column in at least one of the sampled rows contains a value longer than 255 characters, or you must increase the number of rows sampled by the driver to include such a row. You can increase the number of rows sampled by increasing the value of TypeGuessRows under the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry key. For more information, see PRB: Transfer of Data from Jet 4.0 OLEDB Source Fails w/ Error."

Most common export issues (see Excel Destination for more information on exporting)

I do not know of solutions for avoiding either of these export issues . It seems that they have to be handled in Excel after the export.

  • Single quote before text data. To ensure that string data will be treated as strings in Excel, the driver does us the favor of inserting a single quote before each string value. (This is the old Lotus 1-2-3 convention for strings that old-timers will remember.) This quote character is not visible when you look at the worksheet, but may cause problems as you work with the data.
  • Exported numbers appear in scientific (exponential) notation. I've only heard about this problem in recent years, and I'm not sure what changed in the driver or in Excel to cause it. I had trouble reproducing the issue. However in some cases it leads to loss of precision in numeric values.

A Quick Test

Is your import from Excel going to work? Give it a quick test by using the SQL Server Import and Export Wizard! Go as far as the Select Source Tables and Views page so that you can click on the Edit Mappings button to view the data types (and, for string columns, the sizes) that the driver has identified for your columns. Don't trust the Preview window alone without also looking at the data types and sizes on the Edit Mappings page!

Consider testing with IMEX=1 and without it. How do you specify IMEX in the Wizard, since we've done you the favor of explicitly hiding the Jet provider in the list? (The same favor that prevents you from importing DBF files!) Pick the Office 12 provider instead. You have to type or paste in the Excel file path - there's no Browse dialog - but on the All page, you can add "IMEX=1" along with the Excel version for Extended Properties.

Alternatives to Using the Excel Driver

Automate the Excel Application

"Okay then, if the driver won't do what I want, I'll just write some code to automate Excel from my package." This is naughty - automation of Office client apps from server apps that often run unattended is asking for unpleasant surprises (and is unsupported). For more information, see Considerations for server-side Automation of Office.

Use a Third-party Component

You can save data to Excel without using either the driver or Excel itself by calling the API of SoftArtisans OfficeWriter product, as described in Creating an Excel Batch-Reporting Solution With Integration Services and OfficeWriter.

Write Excel XML

Get creative and be the first on your block to write out Excel XML rather than binary files. Sure, it takes a lot of lines of rather tedious code...but it avoids the driver! There's an entire MSDN Developer Center for Office XML, but these may be the best articles at this writing for getting started with Excel XML:

See also

  • Excel Driver Issues and Tips, a file available for download on this site. This is a VERY OLD list of Microsoft Knowledge Base articles related to Excel driver capabilities and issues in various client and server applications. While none of the KB articles in this [slightly out-of-date] list are specific to SSIS, you may find them useful in other projects, or for the background information that they provide.

This is our final installment in this series.

Thank you for following along. Please don't hesitate to comment and contribute! This blog allows anonymous comments, although I moderate comments to guard against anything inappropriate.

-Doug

Comments (4) -

Peter

Is there any way to default IMEX=1 so it can just be used straight from the Wizard?  I ask because I haven't encountered that anywhere.  I deal with so many differing Excel formats regularly that this is actually important/useful if I can do it.


Not expecting miracles, but it would sure be nice if we knew that people were working on the various Excel issues.


I think my favorite so far has been that when I export Excel data to a Tab or CSV file, it drops off columns if they don't have data to the end of the row.  Results in 14 columns, but only 12 on a particular row.  Add in SSIS not honoring the EOL character in 2005 and it's an exercise in frustration.  Try reporting on Connect - get closed with a "by design" note or something to that effect.  Definitely makes me feel like I'm alone in crying out for Excel help with SSIS.


I appreciate your series on this.  It's good to know that I'm not alone and that someone feels the pain of using Excel with SSIS.


dougbert

Peter,


Since the IMEX "Import Mode" option does not seem to correspond to a fixed Registry key, I don't know of any way to set it as the default. In fact, you can't easily set it at all in the Import and Export Wizard, since there's no way to set Extended Properties for the "Microsoft Excel" option; you have to choose the Office 12 provider to have access to Extended Properties.


I suspect the "empty columns" issue that you described is another side-effect of the fact that Excel is a non-standard database. In SQL Server, those empty columns would presumably contain a NULL, which is not the same thing as empty.


You didn't say whether you're exporting Excel-to-CSV from Excel or from SSIS (or the Wizard). If the latter, then this sounds like another problem with our flat file handling that could conceivably be fixed - treat an empty column at the end of a row like a column that contains a null. Maybe this one isn't entirely an Excel driver issue.


FYI I wrote a simple, rough programming sample of a variation on the Excel connection manager that puts an IMEX checkbox right in the ui: msdn.microsoft.com/en-us/library/ms365193.aspx">msdn.microsoft.com/.../ms365193.aspx. However this would need more robust error handling to be used in production.


Best wishes,


-Doug


Peter

Just an FYI - if you export Excel to a flat-file or copy/paste into a text file, you can test this behavior yourself.


Example:


Col1, Col2, Col3


val1, val2, val3


val4, val5 <-- note that there is no comma here to correspond to Col3


Maybe this changed with Excel 2007, but it bit me several times in Excel 2003 and lower so that I had to add a dummy column of some sort after Col3 to hold some value for every row in the sheet.  Otherwise, I couldn't count on it being there when I exported to text.  As for the NULL - not a big deal on an import.  I would just like it to work when I'm forced to go to Text files.  Sadly, that's faster for me than writing up a new package and all of the wiring up and such that goes with that.


There are workarounds to handle this in SSIS 2005 and maybe 2008 handles the EOL character better than 2005 did so I don't have to write script components to honor the EOL if it comes early.  I haven't tested it yet, but will soon.  To me, saying that there's a workaround of writing a script just isn't a great workaround for those people who just need to get a job done.  (And please, don't take this as anything personal - I appreciate the article series quite a bit.)


I agree that most of this comes from the fact that Excel is a non-standard database.  Unfortunately, it's also the easiest one for most people to actually use and that leads to knowing the various tricks come conversion time.  Smile


I'll check out that Excel handler to see what it looks like.  Perhaps that coupled with the example of scanning a ton of rows will lead to some ease.


ranomore

Another third-party option for exporting to Excel is FlexCel Studio from TMS Software. (http://www.tmssoftware.com/site/flexcelnet.asp">www.tmssoftware.com/.../flexcelnet.asp).


What I'm doing right now is basically using SSIS to generate/manipulate/prepare report data, and then letting users go to our website and pull down a FlexCel driven Excel version of it.


Thanks for the script task link! Hadn't even considered trying that.


Pingbacks and trackbacks (1)+