Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

import huge excel sheets to Datatable

Tags:

c#

excel

oledb

I have a Excel Sheet with two tabs.

One is approx 700 k and other is around 25k. The issue is when i am loading the files my memory gets eaten up and it crashes! How to handle huge files as some might even be more than a million rows.

Here is my current code i am using:

  OleDbConnection cnn = new OleDbConnection("provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + fileName + "';Extended Properties=Excel 12.0;");
                cnn.Open();

                string qry = "SELECT * FROM [Detail$]";
                OleDbDataAdapter odp = new OleDbDataAdapter(qry, cnn);
                odp.Fill(detailTable);
                DataSet tmp = new DataSet();
                if (detailTable.Rows.Count > 0)
                {
                    Console.WriteLine("Total " + detailTable.Rows.Count + " Detail rows Loaded");
                    // MessageBox.Show("Input Sheet UPLOADED !");

                }
                qry = "SELECT * FROM [Gallery$]";
                OleDbDataAdapter odp1 = new OleDbDataAdapter(qry, cnn);
                odp1.Fill(galleryTable);
                if (galleryTable.Rows.Count > 0)
                {
                    Console.WriteLine("Total " + galleryTable.Rows.Count + " Gallery Numbers Loaded");
                    //  MessageBox.Show("Input Sheet UPLOADED !");

                }
like image 594
confusedMind Avatar asked Jan 21 '26 16:01

confusedMind


2 Answers

Okay, what I can suggest you is to use the DbDataAdapter.Fill(Int32, Int32, DataTable[]) overload method of the DbDataAdapter class to work in a "chunk" mode:

public int Fill(
    int startRecord,
    int maxRecords,
    params DataTable[] dataTables
)

Using this method and my code example, you can do your job with chunks of rows at a time instead of working with the full excel data in-memory. After each fill, dispose your temp data-table object and you'll be able to avoid of memory leaks this way.

Here is how you can do it:

        const string fileName = "myData.xlsx";
        const string excelConnString = "provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + fileName + "';Extended Properties=Excel 12.0;";

        using (var cnn = new OleDbConnection(excelConnString))
        {
            cnn.Open();

            const string countQuery = "SELECT COUNT(*) FROM [Detail$]";
            using (var cmd = new OleDbCommand(countQuery, cnn))
            {
                using (var reader = cmd.ExecuteReader())
                {
                    if (reader == null) return;

                    reader.Read();
                    var rowsCount = ((int)reader[0]);

                    const string query = "SELECT * FROM [Detail$]";
                    using (var odp = new OleDbDataAdapter(query, cnn))
                    {
                        var detailTable = new DataTable();
                        var recordToStartFetchFrom = 0; //zero-based record number to start with.
                        const int chunkSize = 100;
                        while (recordToStartFetchFrom <= rowsCount)
                        {
                            var diff = rowsCount - recordToStartFetchFrom;
                            int internalChunkSize = diff < 100 ? diff : chunkSize;
                            odp.Fill(recordToStartFetchFrom, internalChunkSize, detailTable);

                            foreach (DataRow row in detailTable.Rows)
                            {
                                Console.WriteLine("{1} {0}", row.ItemArray[0], row.ItemArray[1]);
                            }

                            Console.WriteLine("--------- {0}-{1} Rows Processed ---------", recordToStartFetchFrom, recordToStartFetchFrom + internalChunkSize);

                            recordToStartFetchFrom += chunkSize;

                            detailTable.Dispose();
                            detailTable = null;
                            detailTable = new DataTable();
                        }
                    }
                    Console.ReadLine();
                }
            }
        }
like image 109
13 revsYair Nevet Avatar answered Jan 24 '26 05:01

13 revsYair Nevet


Since you need to load large amount of data in memory (i.e. 1M rows * 1Kb per row ~ 1GB) your only reasonably option is to use build 64bit (x64) application as address space restriction on x86 application (2GB on normal x86 system, up to 4GB on x64 system) will not allow to allocate enough memory.

Notes:

  • going x64 may not be possible in some cases (i.e. lack of x64 native libraries used via some sort of PInvoke)
  • you may be able to fit more rows in memory in x86 process if using carefully designed custom classes for rows. I.e. some values may be present in your DB as strings and can be represented as enum of 1-4 byte long instead of strings in memory.
  • consider moving search to DB/Excel instead of using custom in-memory search.
  • if sticking with x86 make sure not to load more than 1 copy of data in memory.
like image 24
Alexei Levenkov Avatar answered Jan 24 '26 06:01

Alexei Levenkov