Importing from Excel: IMEX and mixed data types

When we talk about using import mode, "IMEX=1," to resolve a common problem when importing from Excel, we sometimes give or get the impression that setting IMEX=1 automatically imports all data as text. This is an over-simplification.

In fact, IMEX=1 causes the value of the Registry setting, ImportMixedTypes, to be applied. ImportMixedTypes commonly has a value of Text. Therefore, if and only if you have a column that contains both numbers and strings, all the values in that column are imported as strings.

However, if your columns do not contain mixed data types - if each column is purely strings or purely numbers - then the numbers remain numbers, and are imported as numeric values and not as text. That is, ImportMixedTypes is not applicable and is not applied. For some users, this may be an important clarification: You can't use IMEX=1 to force everything to text.

Here are the Registry settings for Jet's Excel driver:

 Thanks to BI blogger and new MVP Todd McDermid for the comments that reminded me that my previous postings didn't clarify this point.



Comments (3) -


IIRC, doesn't IMEX = 1 also force Memo values to be nvarchar(255) equivalents? I like the idea because we get a lot of problems with ZIP+4 codes coming over as NULL, but when we set this, I seem to remember that Memo data can be truncated.  Does this also use the TypeGuessRows value to figure out the column type? If so, that will still give fits if you have the first X rows of the aforementioned ZIP+4 data being straight ZIP and the ZIP+4 values scattered throughout the table.  Just bringing this up because these sorts of columns have been a large pain point when trying to import Excel data into SQL Server using SSIS.



I admit that I have not experimented with every combination of settings and data types for the Excel driver. However, the driver does have to determine whether in fact a column has mixed data types, so it can decide whether to apply the ImportMixedTypes setting or not. Therefore I'm confident that TypeGuessRows still applies also.

A 5-digit Zip code will typically be recognized as a number, while a Zip+4 with the dash will be recognized as text. This is indeed a common case where IMEX can solve the mixed types problem.

And yes, if your column has mixed types or memo-length values, then the rows sampled by the driver (TypeGuessRows) have to include mixed types or a string > 255 to get the results that you expect.

Painful but possible!

My next little project is to hunt down more information about the maximum value for TypeGuessRows.



Thank you so much Doug. This is really helpfull

Pingbacks and trackbacks (2)+