In our application we have some data structures which amongst other things contain a chunked list of bytes (currently exposed as a List<byte[]>
). We chunk bytes up because if we allow the byte arrays to be put on the large object heap then over time we suffer from memory fragmentation.
We've also started using Protobuf-net to serialize these structures, using our own generated serialization DLL.
However we've noticed that Protobuf-net is creating very large in-memory buffers while serializing. Glancing through the source code it appears that perhaps it can't flush its internal buffer until the entire List<byte[]>
structure has been written because it needs to write the total length at the front of the buffer afterwards.
This unfortunately undoes our work with chunking the bytes in the first place, and eventually gives us OutOfMemoryExceptions due to memory fragmentation (the exception occurs at the time where Protobuf-net is trying to expand the buffer to over 84k, which obviously puts it on the LOH, and our overall process memory usage is fairly low).
If my analysis of how Protobuf-net is working is correct, is there a way around this issue?
Update
Based on Marc's answer, here is what I've tried:
[ProtoContract]
[ProtoInclude(1, typeof(A), DataFormat = DataFormat.Group)]
public class ABase
{
}
[ProtoContract]
public class A : ABase
{
[ProtoMember(1, DataFormat = DataFormat.Group)]
public B B
{
get;
set;
}
}
[ProtoContract]
public class B
{
[ProtoMember(1, DataFormat = DataFormat.Group)]
public List<byte[]> Data
{
get;
set;
}
}
Then to serialize it:
var a = new A();
var b = new B();
a.B = b;
b.Data = new List<byte[]>
{
Enumerable.Range(0, 1999).Select(v => (byte)v).ToArray(),
Enumerable.Range(2000, 3999).Select(v => (byte)v).ToArray(),
};
var stream = new MemoryStream();
Serializer.Serialize(stream, a);
However if I stick a breakpoint in ProtoWriter.WriteBytes()
where it calls DemandSpace()
towards the bottom of the method and step into DemandSpace()
, I can see that the buffer isn't being flushed because writer.flushLock
equals 1
.
If I create another base class for ABase like this:
[ProtoContract]
[ProtoInclude(1, typeof(ABase), DataFormat = DataFormat.Group)]
public class ABaseBase
{
}
[ProtoContract]
[ProtoInclude(1, typeof(A), DataFormat = DataFormat.Group)]
public class ABase : ABaseBase
{
}
Then writer.flushLock
equals 2
in DemandSpace()
.
I'm guessing there is an obvious step I've missed here to do with derived types?
Additional re your edit; the [ProtoInclude(..., DataFormat=...)]
looks like it simply wasn't being processed. I have added a test for this in my current local build, and it now passes:
[Test]
public void Execute()
{
var a = new A();
var b = new B();
a.B = b;
b.Data = new List<byte[]>
{
Enumerable.Range(0, 1999).Select(v => (byte)v).ToArray(),
Enumerable.Range(2000, 3999).Select(v => (byte)v).ToArray(),
};
var stream = new MemoryStream();
var model = TypeModel.Create();
model.AutoCompile = false;
#if DEBUG // this is only available in debug builds; if set, an exception is
// thrown if the stream tries to buffer
model.ForwardsOnly = true;
#endif
CheckClone(model, a);
model.CompileInPlace();
CheckClone(model, a);
CheckClone(model.Compile(), a);
}
void CheckClone(TypeModel model, A original)
{
int sum = original.B.Data.Sum(x => x.Sum(b => (int)b));
var clone = (A)model.DeepClone(original);
Assert.IsInstanceOfType(typeof(A), clone);
Assert.IsInstanceOfType(typeof(B), clone.B);
Assert.AreEqual(sum, clone.B.Data.Sum(x => x.Sum(b => (int)b)));
}
This commit is tied into some other, unrelated refactorings (some rework for WinRT / IKVM), but should be committed ASAP.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With