Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OleDB, Misses the first character of data

I have a CSV Reading code for ASP.NET application I maintain. This ASP.NET website is running fine from 3 yrs now, and CSV reading code that use Ole.JetDB.4.0 is doing its work fine, except that once in a while some CSV with more than 4K-5K records create a problem. Usually the problem is that a record at random position [random row] miss the first character of it.

CSV File is just bunch of name and addresses per row, and they are in ASNI Format. CSV is comma seperate, no data have "comma" in data and now enclosing of field in Single or Double quote. Also, it doesn't happen often, We use the same code for say 70K record upload they works fine, but some time say in 3 yrs about 3-4 files have this problem only, we upload about one file daily.

For those who need what I did

using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection
    ("Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties='text;HDR=Yes;FMT=Delimited';Data Source=" + HttpContext.Current.Server.MapPath("/System/SaleList/"))
{
   string sql_select = "select * from [" + this.FileName + "]";
   System.Data.OleDb.OleDbDataAdapter da = new System.Data.OleDb.OleDbDataAdapter();
   da.SelectCommand = new System.Data.OleDb.OleDbCommand(sql_select, conn);
   DataSet ds = new DataSet();
   // Read the First line of File to know the header
   string[] lines = System.IO.File.ReadAllLines(HttpContext.Current.Server.MapPath("/System/SaleList/") + FileName);
   string header = "";
   if (lines.Length > 0)
      header = lines[0];

   string[] headers = header.Split(',');
   CreateSchema(headers, FileName);
   da.Fill(ds, "ListData");
   DataTable dt = ds.Tables["ListData"];
}

And this code is working fine except the mention thing. I cut some unrelated part so, might not work by copy paste.

EDIT: More information

  1. I try to use ODBC with Microsoft Text Driver, then I use ACE Driver with OleDB. the result is same with all three drive.

  2. If I swap the problem record, with the preceding Row those rows are read quite well, until the next problem row [if more than one row is having problem in original file], if those are only problem row it works fine.

So from above it looks like that something is there that distract character counter, but how I can ensure it working smooth is still a quiz.

EDIT 2: I have submitted it as bug to Microsoft here : https://connect.microsoft.com/VisualStudio/feedback/details/811869/oledb-ace-driver-12-jet-4-0-or-odbc-text-driver-all-fail-to-read-data-properly-from-csv-text-file

like image 912
Sumit Gupta Avatar asked Dec 17 '13 18:12

Sumit Gupta


3 Answers

I would suggest you examine a problem file with a hex editor - inspect the line that causes the problem and the line immediately preceding it.

In particular look at the line terminators (CR/LF? CR only? LF only?) and look for any non-printable characters.

like image 104
Joe Avatar answered Oct 21 '22 18:10

Joe


Try using ACE Driver instead of JET (it's available on x86 and x64 servers, JET is only x86!)

using (System.Data.OleDb.OleDbConnection conn 
     = new System.Data.OleDb.OleDbConnection
    ("Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties="Excel 12.0 Xml;HDR=YES";
     Data Source=" + HttpContext.Current.Server.MapPath("/System/SaleList/"))
 {
like image 31
Emanuele Greco Avatar answered Oct 21 '22 18:10

Emanuele Greco


I got the same OleDB, Missing characters of data problem, see here:

enter image description here

The characters go missing because the Microsoft.Jet.OLEDB.4.0 driver tries to guess the column datatype. In my case its was treating the data as hexadecimal not alphanumeric.

Problematic oledbProviderString:

oledbProviderString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"
{0}\";Extended Properties=\"Text;HDR=No;FMT=Delimited\"";

To fix the problem I added TypeGuessRows=0

oledbProviderString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"  
{0}\";Extended Properties=\"Text;HDR=No;FMT=Delimited;TypeGuessRows=0\"";

Repro:

Create a Book1.csv file with this content:

KU88,G6,CC
KU88,F7,CC

Step through this code as pictured above.

private void button1_Click(object sender, EventArgs e)
{
    string folder = @"G:\Developers\Folder";
    ReproProblem(folder);
}

static string oledbProviderString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"Text;HDR=No;FMT=Delimited\"";

private void ReproProblem(string folderPath)
{
    using (OleDbConnection oledbConnection = new OleDbConnection(string.Format(oledbProviderString, folderPath)))
    {
        string sqlStatement = "Select * from [Book1.csv]";
        //open the connection
        oledbConnection.Open();
        //Create an OleDbDataAdapter for our connection
        OleDbDataAdapter adapter = new OleDbDataAdapter(sqlStatement, oledbConnection);
        //Create a DataTable and fill it with data
        DataTable table = new DataTable();
        adapter.Fill(table);
        //close the connection
        oledbConnection.Close();
    }
}
like image 1
Jeremy Thompson Avatar answered Oct 21 '22 20:10

Jeremy Thompson