Google Protocol Buffers Serialization hangs writing 1GB+ data

Tags:

I am serializing a large data set using protocol buffer serialization. When my data set contains 400000 custom objects of combined size around 1 GB, serialization returns in 3~4 seconds. But when my data set contains 450000 objects of combined size around 1.2 GB, serialization call never returns and CPU is constantly consumed.

I am using .NET port of Protocol Buffers.

550

asked Jun 15 '11 12:06

muddxr

1 Answers

Looking at the new comments, this appears to be (as the OP notes) MemoryStream capacity limited. A slight annoyance in the protobuf spec is that since sub-message lengths are variable and must prefix the sub-message, it is often necessary to buffer portions until the length is known. This is fine for most reasonable graphs, but if there is an exceptionally large graph (except for the "root object has millions of direct children" scenario, which doesn't suffer) it can end up doing quite a bit in-memory.

If you aren't tied to a particular layout (perhaps due to .proto interop with an existing client), then a simple fix is as follows: on child (sub-object) properties (including lists / arrays of sub-objects), tell it to use "group" serialization. This is not the default layout, but it says "instead of using a length-prefix, use a start/end pair of tokens". The downside of this is that if your deserialization code doesn't know about a particular object, it takes longer to skip the field, as it can't just say "seek forwards 231413 bytes" - it instead has to walk the tokens to know when the object is finished. In most cases this isn't an issue at all, since your deserialization code fully expects that data.

To do this:

[ProtoMember(1, DataFormat = DataFormat.Group)]
public SomeType SomeChild { get; set; }
....
[ProtoMember(4, DataFormat = DataFormat.Group)]
public List<SomeOtherType> SomeChildren { get { return someChildren; } }

The deserialization in protobuf-net is very forgiving (by default there is an optional strict mode), and it will happily deserialize groups in place of length-prefix, and length-prefix in place of groups (meaning: any data you have already stored somewhere should work fine).

answered Sep 23 '22 02:09

Marc Gravell

Related questions
                            
                                Foo() vs this.Foo() [closed]
                            
                                How to handle master page button event in content page?
                            
                                NHibernate 3, Dynamic-Component, Dictionaries, and LINQ Queries
                            
                                WCF CollectionDataContract
                            
                                How to test a WCF Webservice with JMeter?
                            
                                File Uploading using Server.MapPath() and FileUpload.SaveAs()
                            
                                Why I use google 'smtp' cannot send out email?
                            
                                IL: ldfld vs ldflda
                            
                                C#'s `yield return` is creating a lot of garbage for me. Can it be helped?
                            
                                Starfield Screensaver Equations
                            
                                initializing the entity framework objectcontext on every call
                            
                                How to detect if an object is of type generic List and cast to the required type?
                            
                                How to programmatically create Windows user accounts on Windows 7 or Windows Server 2008?
                            
                                Strategies for replacing legacy data layer with Entity framework and POCO classes
                            
                                C# - Intercepting property changes in subclasses
                            
                                Dependency Injection leads to proliferation of factories?
                            
                                Avoiding the 2100 parameter limit in LINQ to SQL
                            
                                What causes WPF printer output to be rasterized?
                            
                                How to get all user account names in XP, Vist & 7, for 32 or 64 bit, and any OS language
                            
                                WPF Calling method from a DataTrigger

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Google Protocol Buffers Serialization hangs writing 1GB+ data

Tags:

c#

.net

serialization

protocol-buffers

protobuf-net

muddxr

People also ask

1 Answers

Marc Gravell

Recent Activity

Donate For Us