Jun
16
2008

Excel in Integration Services, Part 1 of 3: Connections and Components

This post is the 1st in a series of 3 planned installments about using Excel data files with Microsoft SQL Server Integration Services.

  1. Connections and components
  2. Tables and data types
  3. Issues and alternatives

The Excel Driver

Excel connectivity in SQL Server Integration Services uses the Jet provider and the Excel ISAM (indexed sequential access method) driver that is a component of the Microsoft Jet database engine.

Since the Jet provider expects an Access database file (*.mdb) by default, you have to give Jet additional information when you want it to use one of its ISAM drivers for a different file type like Excel. If you were writing your own connection string for the Excel driver, you would provide this additional information in the Extended Properties argument. For Excel 97-2003, the Excel connection manager handles this for you; for Excel 2007, you have to enter at least one value manually on the All tab of the Data Link Properties dialog box.

  • The one required value is the Excel version: use "Excel 8.0" for Excel 97, 2000, 2002, and 2003 (Excel versions 8 through 11), and "Excel 12.0" for Excel 2007.
  • The other commonly used value is "HDR=Yes" (or No), which corresponds to the First row has column names option in the Excel connection manager. If not specified, the default is "Yes". When you don't have headings, the driver assigns F1, F2, and so forth as temporary column headings. When you don't have headings but mistakenly tell the driver that you do, your first row of data disappears to serve as column headings.

You cannot use the driver to connect to a password-protected workbook. Don't waste time trying or arguing. However, the driver can connect to a password-protected workbook if the workbook is already open in the Excel application. But using the driver to read from a workbook that's open in Excel causes a memory leak that can cause Excel to stop responding.

The Excel driver is still 32-bit only

On a 64-bit computer, you have to run a package that uses Excel in 32-bit WOW mode. There is only a 32-bit version of the Excel driver. The program manager for ACE once told me that they hope to ship a 64-bit version of the ACE provider in the next version of Microsoft Office, Office 14. We'll see! (There's no Office 13.)

To get the 32-bit version of dtexec so you can run packages in 32-bit mode, make sure that you select Management Tools - Complete during SQL Server 2008 Setup, or Client Tools during SQL Server 2005 Setup, on a 64-bit computer.

For more information about 64-bit concerns, see 64-bit Considerations for SQL Server Integration Services.

Working with Excel 2003 and earlier

Working with Excel 2007 (12.0)

To work with Excel 2007 files, you have to use the "ACE" provider, which appears in the providers list as "Microsoft Office 12.0 Access Database Engine OLE DB Provider". The Office 12 team took the code for Jet and the Excel driver and updated it to work with the Excel 12/2007 file format (.xlsx). Of course it also works with previous versions of Excel file formats. If you don't have Office 2007, download the ACE provider at 2007 Office System Driver: Data Connectivity Components.

  • In the SQL Server Import and Export Wizard, select "Microsoft Office 12.0 Access Database Engine OLE DB Provider" as the data source or destination. Type or paste the full path and filename as the value of  Data Source on the Connection" tab of the Data Link Properties dialog box. Then be sure to add "Excel 12.0" as the value of Extended Properties on the All tab.
  • In an Integration Services package, use an OLE DB Connection Manager, OLE DB Source and OLE DB Destination with Excel 2007 files. Choose and configure the provider as just described. (The plan to update the Excel Connection Manager to offer Excel 2007 as an option was postponed past SQL Server 2008 RTM.)

In an Integration Services package, the Excel Connection Manager now supports the Excel 2007 file format. Also, in the SQL Server Import and Export Wizard, the "Excel" source or destination option now supports the Excel 2007 format - you do not have to select the ACE Provider. In both cases, however, the Excel 97-2003 format remains the default setting for now.

This information is summarized in Books Online in How to: Connect to an Excel Workbook.

Here's an interesting gotcha! The Foreach File enumerator secretly appends a wildcard to the end of the file extension that you specify. So let's say that you configure the Foreach File enumerator to return a list of Excel 2003 .xls files. Surprise! You also get any Excel 2007 .xlsx files that exist in the same path, possibly causing your package to fail. There appears to be no way to specify, "Give me .xls but not .xlsx".

See also

In our next installment...

In Tables and data types. we'll list the objects that represent tables in an Excel data source, and talk about how to query them. We'll also look at the data types recognized by the Excel driver, and the corresponding Integration Services data types, along with conversion issues.

- Doug

Comments (3) -

Peter

Really looking forward to the next installment.  Excel just gives me fits when trying to import data.  The worst are postal codes, especially in the US.  First column == 12345, some time later you get 12345-6789 for a Zip+4 and it imports as NULL because Excel expects a Float.  I realize that I can use SSIS and tweak all sorts of parameters, but when just trying to get through the task quickly using the wizard, it's a painful process.


I deal with a lot of different spreadsheets so they aren't all the same format and I don't have the time to custom-design each SSIS package, especially to handle type conversions from nvarchar to varchar.  I need to get through them quickly and move on.  I often export these problem files to Tab-delimited (after adding a "dummy" column to avoid the missing fields issue) and pull in that text file.  I'd love to hear if there are some ways to avoid this with the SSIS wizard.  Frankly, this is one of the areas where it seems that DTS had the advantage for me.


Good information about the xls and xlsx files in the same folder.  I didn't know that and will try to ensure that we push those into different folders for processing.  Perhaps a job that loops through the files first to move all xlsx files to a different path?  Not a huge issue at the moment, but good to know.


Bob Williamson

I have been looking for an SSIS error that I have seen several times ... and this addresses it ...     microsoft in thier ultimate wisdom decided to change excel, offer up Office 2007, and postpone the possiblity that developers and dba's actually use this stuff now.  Very frustrating in dealing with them but very refreshing when someone finally offers up an explanation as to what is giong on.   Thank You


dave

Thanks for saving me a lot of time!

Pingbacks and trackbacks (1)+