Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BinaryFormatter alternatives

A BinaryFormatter-serialized array of 128³ doubles, takes up 50 MB of space. Serializing an array of 128³ structs with two double fields takes up 150 MB and over 20 seconds to process.

Are there fast simple alternatives that would generate compact files? My expectation is that the above examples would take up 16 and 32 MB, respectively, and under two seconds to process. I took a look at protobuf-net, but it appears that it does not even support struct arrays.

PS: I apologize for making a mistake in recording file sizes. The actual space overhead with BinaryFormatter is not large.

like image 580
Don Reba Avatar asked Nov 04 '09 19:11

Don Reba


People also ask

Why is BinaryFormatter insecure?

BinaryFormatter uses violates 2.), which is a huge security risk because it makes possible to run any code.

Is binary formatter safe?

The BinaryFormatter type is dangerous and is not recommended for data processing. Applications should stop using BinaryFormatter as soon as possible, even if they believe the data they're processing to be trustworthy. BinaryFormatter is insecure and can't be made secure.

What is a binary formatter?

The class BinaryFormatter in C# performs the actions of “serialization” and “deserialization” of binary data. It takes simple data structures such as integers (int), decimal numbers (float), and collections of letters and numbers (string) and can convert them into a binary format.


2 Answers

If you use a BinaryWriter instead of a Serializer you will get the desired (mimimal) size.
I'm not sure about the speed, but give it a try.

On my system writing 32MB takes less than 0.5 seconds, including Open and Close of the stream.

You will have to write your own for loops to write the data, like this:

struct Pair
{
    public double X, Y;
}

static void WritePairs(string filename, Pair[] data)
{
    using (var fs = System.IO.File.Create(filename))
    using (var bw = new System.IO.BinaryWriter(fs))
    {
        for (int i = 0; i < data.Length; i++)
        {
            bw.Write(data[i].X);
            bw.Write(data[i].Y);
        }
    }
}

static void ReadPairs(string fileName, Pair[] data)
{
    using (var fs = System.IO.File.OpenRead(fileName))
    using (var br = new System.IO.BinaryReader(fs))
    {
        for (int i = 0; i < data.Length; i++)
        {
            data[i].X = br.ReadDouble();
            data[i].Y = br.ReadDouble();
        }
    }
}
like image 76
Henk Holterman Avatar answered Sep 20 '22 05:09

Henk Holterman


Serializing means that metadata is added so that the data can be safely deserialized, that's what's causing the overhead. If you serialize the data yourself without any metadata, you end up with 16 MB of data:

foreach (double d in array) {
   byte[] bin = BitConverter.GetBytes(d);
   stream.Write(bin, 0, bin.Length);
}

This of course means that you have to deserialize the data yourself also:

using (BinaryReader reader = new BinaryReader(stream)) {
   for (int i = 0; i < array.Length; i++) {
      byte[] data = reader.ReadBytes(8);
      array[i] = BitConverter.ToDouble(data, 0);
   }
}
like image 40
Guffa Avatar answered Sep 22 '22 05:09

Guffa