Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strange out-of-memory exception during serialization

I am using VSTS2008 + C# + .Net 3.5 to run this console application on x64 Server 2003 Enterprise with 12G physical memory.

Here is my code, and I find when executing statement bformatter.Serialize(stream, table), there is out of memory exception. I monitored memory usage through Perormance Tab of Task Manager and I find only 2G physical memory is used when exception is thrown, so should be not out of memory. :-(

Any ideas what is wrong? Any limitation of .Net serialization?

    static DataTable MakeParentTable()
    {
        // Create a new DataTable.
        System.Data.DataTable table = new DataTable("ParentTable");
        // Declare variables for DataColumn and DataRow objects.
        DataColumn column;
        DataRow row;

        // Create new DataColumn, set DataType, 
        // ColumnName and add to DataTable.    
        column = new DataColumn();
        column.DataType = System.Type.GetType("System.Int32");
        column.ColumnName = "id";
        column.ReadOnly = true;
        column.Unique = true;
        // Add the Column to the DataColumnCollection.
        table.Columns.Add(column);

        // Create second column.
        column = new DataColumn();
        column.DataType = System.Type.GetType("System.String");
        column.ColumnName = "ParentItem";
        column.AutoIncrement = false;
        column.Caption = "ParentItem";
        column.ReadOnly = false;
        column.Unique = false;
        // Add the column to the table.
        table.Columns.Add(column);

        // Make the ID column the primary key column.
        DataColumn[] PrimaryKeyColumns = new DataColumn[1];
        PrimaryKeyColumns[0] = table.Columns["id"];
        table.PrimaryKey = PrimaryKeyColumns;

        // Create three new DataRow objects and add 
        // them to the DataTable
        for (int i = 0; i <= 5000000; i++)
        {
            row = table.NewRow();
            row["id"] = i;
            row["ParentItem"] = "ParentItem " + i;
            table.Rows.Add(row);
        }

        return table;
    }

    static void Main(string[] args)
    {
        DataTable table = MakeParentTable();
        Stream stream = new MemoryStream();
        BinaryFormatter bformatter = new BinaryFormatter();
        bformatter.Serialize(stream, table);   // out of memory exception here
        Console.WriteLine(table.Rows.Count);

        return;
    }

thanks in advance, George

like image 272
George2 Avatar asked Aug 18 '09 04:08

George2


4 Answers

Note: DataTable defaults to the xml serialization format that was used in 1.*, which is incredibly inefficient. One thing to try is switching to the newer format:

 dt.RemotingFormat = System.Data.SerializationFormat.Binary;

Re the out-of-memory / 2GB; individual .NET objects (such as the byte[] behind a MemoryStream) are limited to 2GB. Perhaps try writing to a FileStream instead?

(edit: nope: tried that, still errors)

I also wonder if you may get better results (in this case) using table.WriteXml(stream), perhaps with compression such as GZIP if space is a premium.

like image 174
Marc Gravell Avatar answered Nov 19 '22 15:11

Marc Gravell


As already discussed this is a fundamental issue with trying to get contiguous blocks of memory in the Gigabyte sort of size.

You will be limited by (in increasing difficulty)

  1. The amount of addressable memory
    • since you are 64bit this will be you 12GB physical memory, less any holes in it required by devices plus any swap file space.
    • Note that you must be running an app with the relevant PE headers that indicate it can run 64bit or you will run under WoW64 and only have 4GB of address space.
    • Also note that the default target was changed in 2010, a contentious change.
  2. The CLR's limitation that no single object may consume more than 2GB of space.
  3. Finding a contiguous block within the available memory.

You can find that you run out of space before the CLR limit of 2 because the backing buffer in the stream is expanded in a 'doubling' fashion and this swiftly results in the buffer being allocated in the Large Object Heap. This heap is not compacted in the same way the other heaps are(1) and as a result the process of building up to the theoretical maximum size of the buffer under 2 fragments the LOH so that you fail to find a sufficiently large contiguous block before this happens.

Thus a mitigation approach if you are close to the limit is to set the initial capacity of the stream such that it definitely has sufficient space from the start via one of the constructors.

Given that you are writing to the memory stream as part of a serialization process it would make sense to actually use streams as intended and use only the data required.

  • If you are serializing to some file based location then stream it into that directly.
  • If this is data going into a Sql Server database consider using:
    • FILESTREAM 2008 only I'm afraid.
    • From 2005 onwards you can read/write in chunks but writing is not well integrated into ADO.Net
    • For versions prior to 2005 there are relatively unpleasant workarounds
  • If you are serializing this in memory for use in say a comparison then consider streaming the data being compared as well and diffing as you go along.
  • If you are persisting an object in memory to recreate it latter then this really should be going to a file or a memory mapped file. In both cases the operating system is then free to structure it as best it can (in disk caches or pages being mapped in and out of main memory) and it is likely it will do a better job of this than most people are able to do themselves.
  • If you are doing this so that the data can be compressed then consider using streaming compression. Any block based compression stream can be fairly easily converted into a streaming mode with the addition of padding. If your compression API doesn't support this natively consider using one that does or writing the wrapper to do it.
  • If you are doing this to write to a byte buffer which is then pinned and passed to an unmanaged function then use the UnmanagedMemoryStream instead, this stands a slightly better chance of being able to allocate a buffer of this sort of size but is still not guaranteed to do so.

Perhaps if you tell us what you are serializing an object of this size for we might be able to tell you better ways to do it.


  1. This is an implementation detail you should not rely on
like image 20
ShuggyCoUk Avatar answered Nov 19 '22 15:11

ShuggyCoUk


1) The OS is x64, but is the app x64 (or anycpu)? If not, it is capped at 2Gb.

2) Does this happen 'early on', or after the app has been running for some time (i.e. n serializations later)? Could it maybe be a result of large object heap fragmentation...?

like image 40
KristoferA Avatar answered Nov 19 '22 14:11

KristoferA


Interestingly, it actually goes up to 3.7GB before giving a memory error here (Windows 7 x64). Apparently, it would need about double that amount to complete.

Given that the application uses 1.65GB after creating the table, it seems likely that it's hitting the 2GB byte[] (or any single object) limit Marc Gravell is speaking of (1.65GB + 2GB ~= 3.7GB)

Based on this blog, I suppose you could allocate your memory using the WINAPI, and write your own MemoryStream implementation using that. That is, if you really wanted to do this. Or write one using more than one array of course :)

like image 40
Thorarin Avatar answered Nov 19 '22 14:11

Thorarin