Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Excel data extraction - Issue with column data type

I am writing a C# library to read in Excel files (both xls and xlsx) and I'm coming across an issue.

Exactly the same as what was expressed in this question, if my Excel file has a column that has string values, but has a numeric value in the first row, the OLEDB provider assumes that column to be numeric and returns NULL for the values in that column that are not numeric.

I am aware that, as in the answer provided, I can make a change in the registry, but since this is a library I plan to use on many machines and don't want to change every user's registry values, I was wondering if there is a better solution.

Maybe a DB provider other than ACE.OLEDB (and it seems JET is no longer supported well enough to be considered)?

Also, since this needs to work on XLS / XLSX, options such as EPPlus / XML readers won't work for the xls version.

like image 531
John Bustos Avatar asked Jul 28 '15 15:07

John Bustos


People also ask

How do I fix the type of column in Excel?

Select the field (the column) that you want to change. On the Fields tab, in the Properties group, click the arrow in the drop-down list next to Data Type, and then select a data type. Save your changes.

Why is my Excel not showing data types?

Try repairing Microsoft Office 365. That often does something to your Office setup which makes Stock and Geography data types reappear. In Windows go to Control Panel | Programs & Features | Office 365 | Change then choose Online Repair.

How do I enable datatypes in Excel?

Whenever you want to get current data for your data types, right-click a cell with the linked data type and select Data Type > Refresh. That will refresh the cell you selected, plus any other cells that have that same data type.

How does Power Query signal that data is of a certain type?

Power Query reads the table schema from the data source and automatically displays the data by using the correct data type for each column. Unstructured sources Examples include Excel, CSV, and text files. Power Query automatically detects data types by inspecting the values in the table.


1 Answers

Your connection string should look like this

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\myFolder\myExcelfile.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=1";

IMEX=1 in the connection string is the part that you need to treat the column as mixed datatype. This should work fine without the need to edit the registry.

HDR=Yes is simply to mark the first row as column headers and is not needed in your particular problem, however I've included it anyways.

To always use IMEX=1 is a safer way to retrieve data for mixed data columns.

Source: https://www.connectionstrings.com/excel/

Edit:

Here is the data I'm using:

data

Here is the output:

enter image description here

This is the exact code I used:

string connString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\test.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=YES;IMEX=1""";

using (DbClass db = new DbClass(connString))
{
    var x = db.dataReader("SELECT * FROM [Sheet1$]");
    while (x.Read())
    {
        for (int i = 0; i < x.FieldCount; i++)
            Console.Write(x[i] + "\t");
        Console.WriteLine("");
    }
}

The DbClass is a simple wrapper I made in order to make life easier. It can be found here:

http://tech.reboot.pro/showthread.php?tid=4713

like image 163
Cory Avatar answered Oct 10 '22 19:10

Cory