Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory usage serializing chunked byte arrays with Protobuf-net

In our application we have some data structures which amongst other things contain a chunked list of bytes (currently exposed as a List<byte[]>). We chunk bytes up because if we allow the byte arrays to be put on the large object heap then over time we suffer from memory fragmentation.

We've also started using Protobuf-net to serialize these structures, using our own generated serialization DLL.

However we've noticed that Protobuf-net is creating very large in-memory buffers while serializing. Glancing through the source code it appears that perhaps it can't flush its internal buffer until the entire List<byte[]> structure has been written because it needs to write the total length at the front of the buffer afterwards.

This unfortunately undoes our work with chunking the bytes in the first place, and eventually gives us OutOfMemoryExceptions due to memory fragmentation (the exception occurs at the time where Protobuf-net is trying to expand the buffer to over 84k, which obviously puts it on the LOH, and our overall process memory usage is fairly low).

If my analysis of how Protobuf-net is working is correct, is there a way around this issue?


Update

Based on Marc's answer, here is what I've tried:

[ProtoContract]
[ProtoInclude(1, typeof(A), DataFormat = DataFormat.Group)]
public class ABase
{
}

[ProtoContract]
public class A : ABase
{
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public B B
    {
        get;
        set;
    }
}

[ProtoContract]
public class B
{
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public List<byte[]> Data
    {
        get;
        set;
    }
}

Then to serialize it:

var a = new A();
var b = new B();
a.B = b;
b.Data = new List<byte[]>
{
    Enumerable.Range(0, 1999).Select(v => (byte)v).ToArray(),
    Enumerable.Range(2000, 3999).Select(v => (byte)v).ToArray(),
};

var stream = new MemoryStream();
Serializer.Serialize(stream, a);

However if I stick a breakpoint in ProtoWriter.WriteBytes() where it calls DemandSpace() towards the bottom of the method and step into DemandSpace(), I can see that the buffer isn't being flushed because writer.flushLock equals 1.

If I create another base class for ABase like this:

[ProtoContract]
[ProtoInclude(1, typeof(ABase), DataFormat = DataFormat.Group)]
public class ABaseBase
{
}

[ProtoContract]
[ProtoInclude(1, typeof(A), DataFormat = DataFormat.Group)]
public class ABase : ABaseBase
{
}

Then writer.flushLock equals 2 in DemandSpace().

I'm guessing there is an obvious step I've missed here to do with derived types?

like image 999
James Thurley Avatar asked Jul 03 '12 18:07

James Thurley


1 Answers

Additional re your edit; the [ProtoInclude(..., DataFormat=...)] looks like it simply wasn't being processed. I have added a test for this in my current local build, and it now passes:

[Test]
public void Execute()
{

    var a = new A();
    var b = new B();
    a.B = b;

    b.Data = new List<byte[]>
    {
        Enumerable.Range(0, 1999).Select(v => (byte)v).ToArray(),
        Enumerable.Range(2000, 3999).Select(v => (byte)v).ToArray(),
    };

    var stream = new MemoryStream();
    var model = TypeModel.Create();
    model.AutoCompile = false;
#if DEBUG // this is only available in debug builds; if set, an exception is
  // thrown if the stream tries to buffer
    model.ForwardsOnly = true;
#endif
    CheckClone(model, a);
    model.CompileInPlace();
    CheckClone(model, a);
    CheckClone(model.Compile(), a);
}
void CheckClone(TypeModel model, A original)
{
    int sum = original.B.Data.Sum(x => x.Sum(b => (int)b));
    var clone = (A)model.DeepClone(original);
    Assert.IsInstanceOfType(typeof(A), clone);
    Assert.IsInstanceOfType(typeof(B), clone.B);
    Assert.AreEqual(sum, clone.B.Data.Sum(x => x.Sum(b => (int)b)));
}

This commit is tied into some other, unrelated refactorings (some rework for WinRT / IKVM), but should be committed ASAP.

like image 57
Marc Gravell Avatar answered Oct 31 '22 10:10

Marc Gravell