I have huge amont of geographic data represented in simple object structure consisting only structs. All of my fields are of value type.
public struct Child
{
readonly float X;
readonly float Y;
readonly int myField;
}
public struct Parent
{
readonly int id;
readonly int field1;
readonly int field2;
readonly Child[] children;
}
The data is chunked up nicely to small portions of Parent[]
-s. Each array contains a few thousands Parent instances. I have way too much data to keep all in memory, so I need to swap these chunks to disk back and forth. (One file would result approx. 2-300KB).
What would be the most efficient way of serializing/deserializing the Parent[]
to a byte[]
for dumpint to disk and reading back? Concerning speed, I am particularly interested in fast deserialization, write speed is not that critical.
Would simple BinarySerializer
good enough?
Or should I hack around with StructLayout
(see accepted answer)? I am not sure if that would work with array field of Parent.children
.
UPDATE: Response to comments - Yes, the objects are immutable (code updated) and indeed the children
field is not value type. 300KB sounds not much but I have zillions of files like that, so speed does matter.
BinarySerializer is a very general serializer. It will not perform as well as a custom implementation.
Fortunately for your, your data consists of structs only. This means that you will be able to fix a structlayout for Child and just bit-copy the children array using unsafe code from a byte[] you have read from disk.
For the parents it is not that easy because you need to treat the children separately. I recommend you use unsafe code to copy the bit-copyable fields from the byte[] you read and deserialize the children separately.
Did you consider mapping all the children into memory using memory mapped files? You could then re-use the operating systems cache facility and not deal with reading and writing at all.
Zero-copy-deserializing a Child[] looks like this:
byte[] bytes = GetFromDisk();
fixed (byte* bytePtr = bytes) {
Child* childPtr = (Child*)bytePtr;
//now treat the childPtr as an array:
var x123 = childPtr[123].X;
//if we need a real array that can be passed around, we need to copy:
var childArray = new Child[GetLengthOfDeserializedData()];
for (i = [0..length]) {
childArray[i] = childPtr[i];
}
}
If you don't fancy going down the write your own serializer route, you can use the protobuf.net serializer. Here's the output from a small test program:
Using 3000 parents, each with 5 children BinaryFormatter Serialized in: 00:00:00.1250000 Memory stream 486218 B BinaryFormatter Deserialized in: 00:00:00.1718750 ProfoBuf Serialized in: 00:00:00.1406250 Memory stream 318247 B ProfoBuf Deserialized in: 00:00:00.0312500
It should be fairly self-explanatory. This was just for one run, but was fairly indicative of the speed up I saw (3-5x).
To make your structs serializable (with protobuf.net), just add the following attributes:
[ProtoContract]
[Serializable]
public struct Child
{
[ProtoMember(1)] public float X;
[ProtoMember(2)] public float Y;
[ProtoMember(3)] public int myField;
}
[ProtoContract]
[Serializable]
public struct Parent
{
[ProtoMember(1)] public int id;
[ProtoMember(2)] public int field1;
[ProtoMember(3)] public int field2;
[ProtoMember(4)] public Child[] children;
}
UPDATE:
Actually, writing a custom serializer is pretty easy, here is a bare-bones implementation:
class CustSerializer
{
public void Serialize(Stream stream, Parent[] parents, int childCount)
{
BinaryWriter sw = new BinaryWriter(stream);
foreach (var parent in parents)
{
sw.Write(parent.id);
sw.Write(parent.field1);
sw.Write(parent.field2);
foreach (var child in parent.children)
{
sw.Write(child.myField);
sw.Write(child.X);
sw.Write(child.Y);
}
}
}
public Parent[] Deserialize(Stream stream, int parentCount, int childCount)
{
BinaryReader br = new BinaryReader(stream);
Parent[] parents = new Parent[parentCount];
for (int i = 0; i < parentCount; i++)
{
var parent = new Parent();
parent.id = br.ReadInt32();
parent.field1 = br.ReadInt32();
parent.field2 = br.ReadInt32();
parent.children = new Child[childCount];
for (int j = 0; j < childCount; j++)
{
var child = new Child();
child.myField = br.ReadInt32();
child.X = br.ReadSingle();
child.Y = br.ReadSingle();
parent.children[j] = child;
}
parents[i] = parent;
}
return parents;
}
}
And here is its output when run in a simple speed test:
Custom Serialized in: 00:00:00 Memory stream 216000 B Custom Deserialized in: 00:00:00.0156250
Obviously, it's a lot less flexible than the other approaches, but if speed really is that important it's about 2-3x faster than the protobuf method. It produces minimal file sizes as well, so writing to disk should be faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With