We have a v.large Dictionary<long,uint>
(several million entries) as part of a high performance C# application. When the application closes we serialise the dictionary to disk using BinaryFormatter
and MemoryStream.ToArray()
. The serialisation returns in about 30 seconds and produces a file about 200MB in size. When we then try to deserialise the dictionary using the following code:
BinaryFormatter bin = new BinaryFormatter();
Stream stream = File.Open("filePathName", FileMode.Open);
Dictionary<long, uint> allPreviousResults =
(Dictionary<long, uint>)bin.Deserialize(stream);
stream.Close();
It takes about 15 minutes to return. We have tried alternatives and the slow part is definitely bin.Derserialize(stream)
, i.e. the bytes are read from the hard drive (high performance SSD) in under 1 second.
Can someone please point out what we are doing wrong as we want the load time on the same order as the save time.
Regards, Marc
You may checkout protobuf-net or simply serialize it yourself which will probably be the fastest you can get.
class Program
{
public static void Main()
{
var dico = new Dictionary<long, uint>();
for (long i = 0; i < 7500000; i++)
{
dico.Add(i, (uint)i);
}
using (var stream = File.OpenWrite("data.dat"))
using (var writer = new BinaryWriter(stream))
{
foreach (var key in dico.Keys)
{
writer.Write(key);
writer.Write(dico[key]);
}
}
dico.Clear();
using (var stream = File.OpenRead("data.dat"))
using (var reader = new BinaryReader(stream))
{
while (stream.Position < stream.Length)
{
var key = reader.ReadInt64();
var value = reader.ReadUInt32();
dico.Add(key, value);
}
}
}
}
size of resulting file => 90M bytes (85.8MB).
Just to show similar serialization (to the accepted answer) via protobuf-net:
using System.Collections.Generic;
using ProtoBuf;
using System.IO;
[ProtoContract]
class Test
{
[ProtoMember(1)]
public Dictionary<long, uint> Data {get;set;}
}
class Program
{
public static void Main()
{
Serializer.PrepareSerializer<Test>();
var dico = new Dictionary<long, uint>();
for (long i = 0; i < 7500000; i++)
{
dico.Add(i, (uint)i);
}
var data = new Test { Data = dico };
using (var stream = File.OpenWrite("data.dat"))
{
Serializer.Serialize(stream, data);
}
dico.Clear();
using (var stream = File.OpenRead("data.dat"))
{
Serializer.Merge<Test>(stream, data);
}
}
}
Size: 83meg - but most importantly, you haven't had to do it all by hand, introducing bugs. Fast too (will be even faster in "v2").
You may want to use a profiler to see if, behind the scenes, the deserializer is performing a bunch of on-the-fly reflection.
For now, if you don't want to use a database, try storing your objects as a flatfile in a custom format. For example, the first line the file gives the total number of entries in the dictionary, allowing you to instantiate a dictionary with a predetermined size. Have the remaining lines as a series of fixed-width key-value pairs representing all of the entries in your dictionary.
With your new file format, use a StreamReader to read in your file line-by-line or in fixed blocks, see if this allows you read in your dictionary any faster.
There are several fast Key-Value NoSQL solutions out there why not try them? As a example ESENT, somebody posted it here at SO. managedesent
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With