Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

About c# struct memory/serialization overhead

Tags:

c#

My code is like this:

[Serializable]
[StructLayout(LayoutKind.Sequential,Pack=1)]
struct Foo
{
    public byte Bar;            
    public Foo(byte b){Bar=b;}
}
public static void Main (string[] args)
{
    Foo[] arr = new Foo[1000];
    for (int i = 0; i < 1000; i++) {
        arr[i]=new Foo(42);            
    }
    var fmt = new BinaryFormatter();
    using(FileStream f= File.Create("test.bin")){
        fmt.Serialize(f,arr);
    }
    Console.WriteLine (new FileInfo("test.bin").Length);
}

The result bin file is 10095 bytes big. Why do my Foo structs eat so many bytes? What's the 9 bytes per struct overhead all about?

PS: I'm writing a lookup library for Chinese characters (it's about infos for circa 70,000 characters), for which db4o or other embeddable databases (like sqlite) are kind of bloat. I thought to store all information in pure string format,which is most memory friendly, but less flexible. I'd like to keep infos in lists and store them as binary serialization into an archive, I've chosen the DotNetZip for the archiving. But the serialization overhead is an unexpected obstacle. A better serialization solution would be good, otherwise I've to save infos in plain string format and parse it by hard-coding.

like image 513
Need4Steed Avatar asked Mar 06 '11 07:03

Need4Steed


2 Answers

It's not the Foo structure that is so "big", but instead what you're observing is the overhead of the binary serialization format itself. This format contains a header, information to describe the object graph, information that describes the array, strings that describe type and assembly information, etc. That is to say it contains enough information for BinaryFormatter.Deserialize to give you back an array of Foo as you would expect.

For more information, here is the spec that describes the format in detail: http://msdn.microsoft.com/en-us/library/cc236844(PROT.10).aspx

Edit based on your updated question:

If you desire to simply write the contents of the structures to the stream, this can easily be accomplished in an unsafe context (this code is based off of your example).

Using a small array to write out each Foo:

unsafe 
{
    byte[] data = new byte[sizeof(Foo)];

    fixed (Foo* ptr = arr)
    {
        for (int i = 0; i < arr.Length; ++i)
        {
            Marshal.Copy((IntPtr)ptr + i, data, 0, data.Length);
            f.Write(data, 0, data.Length);
        }
    }
}

Or using a single large-enough array to write out all Foos:

unsafe 
{
    byte[] data = new byte[sizeof(Foo) * arr.Length];

    fixed (Foo* ptr = arr)
    {
        Marshal.Copy((IntPtr)ptr, data, 0, data.Length);
        f.Write(data, 0, data.Length);
    }
}

Based on your example, this would write out 1000 bytes with a value of 42 each.

However, this approach has a few drawbacks. If you're familiar with writing out structures in a language like C, some of these should be obvious:

  • If you read the data on a machine with a different endianness than the one you used to write the data, you will not get the desired results. You would need to define an expected byte order and handle conversion to-and-from this order yourself.
  • Foo cannot contain fields that are reference types. That means you would need to use a length field + fixed-size buffer of char instead of System.String; that can be a real pain.
  • If Foo contains pointer types or IntPtr/UIntPtr, then the size of the structure may differ between machine architectures. You would want to avoid using these types if at all possible.
  • You would need to apply your own versioning scheme so that you can have some level of confidence that the data being read back in matches the expected structure definition. Any changes to the layout of the structure would require a new versioning.

BinaryFormatter solves these problems for you, but incurs the space overhead you've observed while doing so. It is designed to exchange data between machines in a safe manner. If you do not wish to use BinaryFormatter, then you will need to either define your own file format and handle the reading and writing of this format yourself or use a third-party serialization library that best suits your needs (I'll leave the research of such libraries up to you).

like image 157
Peter Huene Avatar answered Sep 16 '22 13:09

Peter Huene


If you want to measure how much memory is consumed you can use such code instead:

long nTotalMem1 = System.GC.GetTotalMemory(true);
Foo[] arr = new Foo[1000];
for (int i = 0; i < 1000; i++)
{
    arr[i] = new Foo(42);
}
long nTotalMem2 = System.GC.GetTotalMemory(true);
Console.WriteLine("Memory consumption: " + (nTotalMem2 - nTotalMem1) + " bytes");

Spoiler: 1012 bytes. :)

Edit: maybe more reliable way is using Marshal.SizeOf method:

Console.WriteLine("Size of one instance: " + Marshal.SizeOf(arr[0]) + " bytes");

This returned 1 byte result for me, and when adding another field to the struct it returned 2 bytes so it looks pretty reliable.

like image 33
Shadow Wizard Hates Omicron Avatar answered Sep 18 '22 13:09

Shadow Wizard Hates Omicron