Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serialising and Deserialising V.Large Dictionary in C#

Tags:

c#

dictionary

We have a v.large Dictionary<long,uint> (several million entries) as part of a high performance C# application. When the application closes we serialise the dictionary to disk using BinaryFormatter and MemoryStream.ToArray(). The serialisation returns in about 30 seconds and produces a file about 200MB in size. When we then try to deserialise the dictionary using the following code:

BinaryFormatter bin = new BinaryFormatter();
Stream stream = File.Open("filePathName", FileMode.Open);
Dictionary<long, uint> allPreviousResults =
    (Dictionary<long, uint>)bin.Deserialize(stream);
stream.Close();

It takes about 15 minutes to return. We have tried alternatives and the slow part is definitely bin.Derserialize(stream), i.e. the bytes are read from the hard drive (high performance SSD) in under 1 second.

Can someone please point out what we are doing wrong as we want the load time on the same order as the save time.

Regards, Marc

like image 220
MarcF Avatar asked Jun 25 '10 12:06

MarcF


4 Answers

You may checkout protobuf-net or simply serialize it yourself which will probably be the fastest you can get.

class Program
{
    public static void Main()
    {
        var dico = new Dictionary<long, uint>();
        for (long i = 0; i < 7500000; i++)
        {
            dico.Add(i, (uint)i);
        }

        using (var stream = File.OpenWrite("data.dat"))
        using (var writer = new BinaryWriter(stream))
        {
            foreach (var key in dico.Keys)
            {
                writer.Write(key);
                writer.Write(dico[key]);
            }
        }

        dico.Clear();
        using (var stream = File.OpenRead("data.dat"))
        using (var reader = new BinaryReader(stream))
        {
            while (stream.Position < stream.Length)
            {
                var key = reader.ReadInt64();
                var value = reader.ReadUInt32();
                dico.Add(key, value);
            }
        }
    }
}

size of resulting file => 90M bytes (85.8MB).

like image 97
Darin Dimitrov Avatar answered Nov 11 '22 16:11

Darin Dimitrov


Just to show similar serialization (to the accepted answer) via protobuf-net:

using System.Collections.Generic;
using ProtoBuf;
using System.IO;

[ProtoContract]
class Test
{
    [ProtoMember(1)]
    public Dictionary<long, uint> Data {get;set;}
}

class Program
{
    public static void Main()
    {
        Serializer.PrepareSerializer<Test>();
        var dico = new Dictionary<long, uint>();
        for (long i = 0; i < 7500000; i++)
        {
            dico.Add(i, (uint)i);
        }
        var data = new Test { Data = dico };
        using (var stream = File.OpenWrite("data.dat"))
        {
            Serializer.Serialize(stream, data);
        }
        dico.Clear();
        using (var stream = File.OpenRead("data.dat"))
        {
            Serializer.Merge<Test>(stream, data);
        }
    }
}

Size: 83meg - but most importantly, you haven't had to do it all by hand, introducing bugs. Fast too (will be even faster in "v2").

like image 29
Marc Gravell Avatar answered Nov 11 '22 15:11

Marc Gravell


You may want to use a profiler to see if, behind the scenes, the deserializer is performing a bunch of on-the-fly reflection.

For now, if you don't want to use a database, try storing your objects as a flatfile in a custom format. For example, the first line the file gives the total number of entries in the dictionary, allowing you to instantiate a dictionary with a predetermined size. Have the remaining lines as a series of fixed-width key-value pairs representing all of the entries in your dictionary.

With your new file format, use a StreamReader to read in your file line-by-line or in fixed blocks, see if this allows you read in your dictionary any faster.

like image 25
Juliet Avatar answered Nov 11 '22 16:11

Juliet


There are several fast Key-Value NoSQL solutions out there why not try them? As a example ESENT, somebody posted it here at SO. managedesent

like image 38
ba__friend Avatar answered Nov 11 '22 16:11

ba__friend