Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read a big Excel document

I want to know what is the fastest way to read cells in Excel. I have an Excel file that contains 50000 rows and I wanna know how to read it fast. I just need to read the first column and with oledb connection it takes me like 15 seconds. Is there a faster way?

Thanks

like image 467
Sebastien Avatar asked Mar 11 '13 12:03

Sebastien


2 Answers

Here is a method that relies on using Microsoft.Office.Interop.Excel.

Please Note: The Excel file I used had only one column with data with 50,000 entries.

1) Open the file with Excel, save it as csv, and close Excel.

2) Use StreamReader to quickly read the data.

3) Split the data on carriage return line feed and add it to a string list.

4) Delete the csv file I created.

I used System.Diagnostics.StopWatch to time the execution and it took 1.5568 seconds for the function to run.

public static List<string> ExcelReader( string fileLocation )
{                       
    Microsoft.Office.Interop.Excel.Application excel = new Application();
    Microsoft.Office.Interop.Excel.Workbook workBook =
        excel.Workbooks.Open(fileLocation);
    workBook.SaveAs(
        fileLocation + ".csv",
        Microsoft.Office.Interop.Excel.XlFileFormat.xlCSVWindows
    );
    workBook.Close(true);
    excel.Quit();
    List<string> valueList = null;
    using (StreamReader sr = new StreamReader(fileLocation + ".csv")) {
        string content = sr.ReadToEnd();
        valueList = new List<string>(
            content.Split(
                new string[] {"\r\n"},
                StringSplitOptions.RemoveEmptyEntries
            )
        );
    }
    new FileInfo(fileLocation + ".csv").Delete();
    return valueList;
}

Resources:

http://www.codeproject.com/Articles/5123/Opening-and-Navigating-Excel-with-C

How to split strings on carriage return with C#?

like image 195
jiverson Avatar answered Oct 14 '22 17:10

jiverson


Can you put your code for reading 50000 records using OLEDb provider. I have tried doing that, it took 4-5 seconds to read 50000 records with 3 columns. I have done in following way, just have a look, it may help you out. :)

       // txtPath.Text is the path to the excel file
        string conString = @"Provider=Microsoft.ACE.OLEDB.12.0;" + "Data Source=" + txtPath.Text + ";" + "Extended Properties=" + "\"" + "Excel 12.0;HDR=YES;" + "\"";

        OleDbConnection oleCon = new OleDbConnection(conString);

        OleDbCommand oleCmd = new OleDbCommand("SELECT field1, field2, field3 FROM [Sheet1$]", oleCon);

        DataTable dt = new DataTable();

        oleCon.Open(); 
        dt.Load(oleCmd.ExecuteReader());
        oleCon.Close();

If you can put your code here, so that I can try to rectify. :)

like image 45
Hitesh Avatar answered Oct 14 '22 15:10

Hitesh