Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# serialize large array to disk

I have a very large graph stored in a single dimensional array (about 1.1 GB) which I am able to store in memory on my machine which is running Windows XP with 2GB of ram and 2GB of virtual memory. I am able to generate the entire data set in memory, however when I try to serialize it to disk using the BinaryFormatter, the file size gets to about 50MB and then gives me an out of memory exception. The code I am using to write this is the same I use amongst all of my smaller problems:

StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(@"C:\temp\states.dat"))
{
    bf.Serialize(file, diskReady);
}

The search algorithm is very lightweight, and I am able to perform searches on this graph with no problems once it is in memory.

I really have 3 questions:

  1. Is there a more reliable way to write a large data set to disk. I guess you can define large as when the size of the data set approaches the amount of available memory, though I am not sure how accurate that is.

  2. Should I move to a more database centric approach?

  3. Can anyone point me to some literature on reading portions of a large data set from a disk file in C#?

like image 775
Nick Larsen Avatar asked Jan 27 '26 20:01

Nick Larsen


2 Answers

Write entries to file yourself. One simple solution would be like:

StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(@"C:\temp\states.dat"))
{
  foreach(StateInformation si in diskReady)
    using(MemoryStream ms = new MemoryStream())
    {
      bf.Serialize(ms, diskReady);
      byte[] ser = ms.ToArray();
      int len = ser.Length;
      file.WriteByte((byte) len & 0x000000FF);
      file.WriteByte((byte) (len & 0x0000FF00) >> 8);
      file.WriteByte((byte) (len & 0x00FF0000) >> 16);
      file.WriteByte((byte) (len & 0x7F000000) >> 24);
      file.Write(ser, 0, len);
    }
}

No more than the memory for a single StateInformation object's memory is needed at a time, and to deserialise you read four bytes, construct the length, create a buffer of that size, fill it, and deserialise.

All of the above could be seriously optimised for speed, memory use and disk-size if you create a more specialised format, but the above goes to show the principle.

like image 120
Jon Hanna Avatar answered Jan 30 '26 10:01

Jon Hanna


My experience of larger sets of information like this is to manually write it to disk, rather than using built in serialization.

This may not be pratical depending on how complex you're StateInformation class is, but if it is fairly simple you could write/read the binary data manually using a BinaryReader and BinaryWriter instead. These will allow you to read/write most value types directly to the stream, in an expected predetermined order dictated by your code.

This option should allow you to read/write your data quickly, although it is awkward if you then wish to add information into the StateInformation at a later date, or to take it out as you'll have to manage upgrading your files.

like image 43
Ian Avatar answered Jan 30 '26 08:01

Ian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!