Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataTable memory huge consumption

I´m loading csv data from files into a datatable for processing.

The problem is, that I want to process several files and my tests with the datatable shows me huge memory consumption I tested with a 37MB csv file and the memory growed up to 240MB, which is way to much IMHO. I read, that there is overhead in the datatable and I could live with about 70MB in size , but not 240MB, which means it is six times the original size. I read here, that datatables need more memory than POCOs, but that the difference is way too much.

I put on a memory profiler and looked, if I have memory leaks and where the memory is. I found, that the datatablecolumns have between 6MB and 19MB filled with strings and the datatable had about 20 columns. Are the values stored in the columns? Why is so much memory taken, what can I do to reduce memory consumption. With this memory consumption datattables seem to be unusable.

Had somebody else such problems with datatables, or I´m doing something wrong?

PS: I tried a 70MB file and the datatable growed up to 500MB!

OK here is a small testcase: The 37MB csv-file (21 columns) let the memory grow up to 179MB.

    private static DataTable ReadCsv()
    {
        DataTable table = new DataTable();
        table.BeginLoadData();

        using (var reader = new StreamReader(File.OpenRead(@"C:\Develop\Tests\csv-Data\testdaten\test.csv")))
        {               
            int y = 0;
            int columnsCount = 0;
            while (!reader.EndOfStream)
            {
                var line = reader.ReadLine();
                var values = line.Split(',');

                if (y == 0)
                {
                    columnsCount = values.Count();
                    // create columns
                    for (int x = 0; x < columnsCount; x++)
                    {
                        table.Columns.Add(new DataColumn(values[x], typeof(string)));
                    }
                }
                else
                {
                    if (values.Length == columnsCount)
                    {
                        // add the data
                        table.Rows.Add(values);
                    }
                }

                y++;
            }

            table.EndLoadData();
            table.AcceptChanges();

        }

        return table;
    }
like image 303
JohnnyBravo75 Avatar asked Aug 26 '13 16:08

JohnnyBravo75


1 Answers

DataSet and its children DataTable, DataRow, etc. make up an in-memory relational database. There is a lot of overhead involved (though it does make [some] things very convenient.

If memory is an issue,

  • Build domain objects to represent each row in your CSV file with typed properties.
  • Create a custom collection (or just use IList<T> to hold them
  • Alternatively, build a light-weight class with the basic semantics of a DataTable:
    • the ability to select a row by number
    • the ability to select a column within a row by row number and either column name or number.
    • The ability to know the ordered set of column names
    • Bonus: The ability to select a column by name or ordinal number and receive a list of its values, one per row.

Are you sure you need an in-memory representation of your CSV files? Could you access them via an IDataReader like Sebastien Lorion's Fast CSV Reader?

like image 137
Nicholas Carey Avatar answered Nov 06 '22 03:11

Nicholas Carey