Dougbert.com
Microsoft SQL Server Integration Services stuff by Douglas Laudenschlager.

Excel in Integration Services, Part 1 of 3: Connections and Components

This post is the 1st in a series of 3 planned installments about using Excel data files with Microsoft SQL Server Integration Services.

  1. Connections and components
  2. Tables and data types
  3. Issues and alternatives

The Excel Driver

Excel connectivity in SQL Server Integration Services uses the Jet provider and the Excel ISAM (indexed sequential access method) driver that is a component of the Microsoft Jet database engine.

Since the Jet provider expects an Access database file (*.mdb) by default, you have to give Jet additional information when you want it to use one of its ISAM drivers for a different file type like Excel. If you were writing your own connection string for the Excel driver, you would provide this additional information in the Extended Properties argument. For Excel 97-2003, the Excel connection manager handles this for you; for Excel 2007, you have to enter at least one value manually on the All tab of the Data Link Properties dialog box.

  • The one required value is the Excel version: use "Excel 8.0" for Excel 97, 2000, 2002, and 2003 (Excel versions 8 through 11), and "Excel 12.0" for Excel 2007.
  • The other commonly used value is "HDR=Yes" (or No), which corresponds to the First row has column names option in the Excel connection manager. If not specified, the default is "Yes". When you don't have headings, the driver assigns F1, F2, and so forth as temporary column headings. When you don't have headings but mistakenly tell the driver that you do, your first row of data disappears to serve as column headings.

You cannot use the driver to connect to a password-protected workbook. Don't waste time trying or arguing. However, the driver can connect to a password-protected workbook if the workbook is already open in the Excel application. But using the driver to read from a workbook that's open in Excel causes a memory leak that can cause Excel to stop responding.

The Excel driver is still 32-bit only

On a 64-bit computer, you have to run a package that uses Excel in 32-bit WOW mode. There is only a 32-bit version of the Excel driver. The program manager for ACE once told me that they hope to ship a 64-bit version of the ACE provider in the next version of Microsoft Office, Office 14. We'll see! (There's no Office 13.)

To get the 32-bit version of dtexec so you can run packages in 32-bit mode, make sure that you select Management Tools - Complete during SQL Server 2008 Setup, or Client Tools during SQL Server 2005 Setup, on a 64-bit computer.

For more information about 64-bit concerns, see 64-bit Considerations for SQL Server Integration Services.

Working with Excel 2003 and earlier

Working with Excel 2007 (12.0)

To work with Excel 2007 files, you have to use the "ACE" provider, which appears in the providers list as "Microsoft Office 12.0 Access Database Engine OLE DB Provider". The Office 12 team took the code for Jet and the Excel driver and updated it to work with the Excel 12/2007 file format (.xlsx). Of course it also works with previous versions of Excel file formats. If you don't have Office 2007, download the ACE provider at 2007 Office System Driver: Data Connectivity Components.

  • In the SQL Server Import and Export Wizard, select "Microsoft Office 12.0 Access Database Engine OLE DB Provider" as the data source or destination. Type or paste the full path and filename as the value of  Data Source on the Connection" tab of the Data Link Properties dialog box. Then be sure to add "Excel 12.0" as the value of Extended Properties on the All tab.
  • In an Integration Services package, use an OLE DB Connection Manager, OLE DB Source and OLE DB Destination with Excel 2007 files. Choose and configure the provider as just described. (The plan to update the Excel Connection Manager to offer Excel 2007 as an option was postponed past SQL Server 2008 RTM.)

This information is summarized in Books Online in How to: Connect to an Excel Workbook.

Here's an interesting gotcha! The Foreach File enumerator secretly appends a wildcard to the end of the file extension that you specify. So let's say that you configure the Foreach File enumerator to return a list of Excel 2003 .xls files. Surprise! You also get any Excel 2007 .xlsx files that exist in the same path, possibly causing your package to fail. There appears to be no way to specify, "Give me .xls but not .xlsx".

See also

In our next installment...

In Tables and data types. we'll list the objects that represent tables in an Excel data source, and talk about how to query them. We'll also look at the data types recognized by the Excel driver, and the corresponding Integration Services data types, along with conversion issues.

- Doug


Posted Jun 16 2008, 07:00 AM by dougbert
Filed under:

Comments

Peter wrote re: Excel in Integration Services, Part 1 of 3: Connections and Components
on 06-17-2008 9:05 AM

Really looking forward to the next installment.  Excel just gives me fits when trying to import data.  The worst are postal codes, especially in the US.  First column == 12345, some time later you get 12345-6789 for a Zip+4 and it imports as NULL because Excel expects a Float.  I realize that I can use SSIS and tweak all sorts of parameters, but when just trying to get through the task quickly using the wizard, it's a painful process.

I deal with a lot of different spreadsheets so they aren't all the same format and I don't have the time to custom-design each SSIS package, especially to handle type conversions from nvarchar to varchar.  I need to get through them quickly and move on.  I often export these problem files to Tab-delimited (after adding a "dummy" column to avoid the missing fields issue) and pull in that text file.  I'd love to hear if there are some ways to avoid this with the SSIS wizard.  Frankly, this is one of the areas where it seems that DTS had the advantage for me.

Good information about the xls and xlsx files in the same folder.  I didn't know that and will try to ensure that we push those into different folders for processing.  Perhaps a job that loops through the files first to move all xlsx files to a different path?  Not a huge issue at the moment, but good to know.

Add a Comment

(required)  
(optional)
(required)  
Remember Me?
Powered by Community Server (Non-Commercial Edition), by Telligent Systems