My code is like this:
[Serializable]
[StructLayout(LayoutKind.Sequential,Pack=1)]
struct Foo
{
public byte Bar;
public Foo(byte b){Bar=b;}
}
public static void Main (string[] args)
{
Foo[] arr = new Foo[1000];
for (int i = 0; i < 1000; i++) {
arr[i]=new Foo(42);
}
var fmt = new BinaryFormatter();
using(FileStream f= File.Create("test.bin")){
fmt.Serialize(f,arr);
}
Console.WriteLine (new FileInfo("test.bin").Length);
}
Foo
structs eat so many bytes? What's the 9 bytes per struct overhead all about?PS: I'm writing a lookup library for Chinese characters (it's about infos for circa 70,000 characters), for which db4o or other embeddable databases (like sqlite) are kind of bloat. I thought to store all information in pure string format,which is most memory friendly, but less flexible. I'd like to keep infos in lists and store them as binary serialization into an archive, I've chosen the DotNetZip for the archiving. But the serialization overhead is an unexpected obstacle. A better serialization solution would be good, otherwise I've to save infos in plain string format and parse it by hard-coding.
It's not the Foo structure that is so "big", but instead what you're observing is the overhead of the binary serialization format itself. This format contains a header, information to describe the object graph, information that describes the array, strings that describe type and assembly information, etc. That is to say it contains enough information for BinaryFormatter.Deserialize
to give you back an array of Foo as you would expect.
For more information, here is the spec that describes the format in detail: http://msdn.microsoft.com/en-us/library/cc236844(PROT.10).aspx
Edit based on your updated question:
If you desire to simply write the contents of the structures to the stream, this can easily be accomplished in an unsafe context (this code is based off of your example).
Using a small array to write out each Foo:
unsafe
{
byte[] data = new byte[sizeof(Foo)];
fixed (Foo* ptr = arr)
{
for (int i = 0; i < arr.Length; ++i)
{
Marshal.Copy((IntPtr)ptr + i, data, 0, data.Length);
f.Write(data, 0, data.Length);
}
}
}
Or using a single large-enough array to write out all Foos:
unsafe
{
byte[] data = new byte[sizeof(Foo) * arr.Length];
fixed (Foo* ptr = arr)
{
Marshal.Copy((IntPtr)ptr, data, 0, data.Length);
f.Write(data, 0, data.Length);
}
}
Based on your example, this would write out 1000 bytes with a value of 42 each.
However, this approach has a few drawbacks. If you're familiar with writing out structures in a language like C, some of these should be obvious:
BinaryFormatter solves these problems for you, but incurs the space overhead you've observed while doing so. It is designed to exchange data between machines in a safe manner. If you do not wish to use BinaryFormatter, then you will need to either define your own file format and handle the reading and writing of this format yourself or use a third-party serialization library that best suits your needs (I'll leave the research of such libraries up to you).
If you want to measure how much memory is consumed you can use such code instead:
long nTotalMem1 = System.GC.GetTotalMemory(true);
Foo[] arr = new Foo[1000];
for (int i = 0; i < 1000; i++)
{
arr[i] = new Foo(42);
}
long nTotalMem2 = System.GC.GetTotalMemory(true);
Console.WriteLine("Memory consumption: " + (nTotalMem2 - nTotalMem1) + " bytes");
Spoiler: 1012 bytes. :)
Edit: maybe more reliable way is using Marshal.SizeOf
method:
Console.WriteLine("Size of one instance: " + Marshal.SizeOf(arr[0]) + " bytes");
This returned 1 byte result for me, and when adding another field to the struct it returned 2 bytes so it looks pretty reliable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With