I tried to find some recommendations on the web but could not find anything relevant. Let's say that I am creating a protocol buffer message that will contain a lot of fields (50+). Is it best to keep all the fields at the same level or to organize them in sub-messages? Is there any impacts on performances for one way or another? Example: <pre class="prettyprint"><code>message myMessage{ string field1 = 1; string field2 = 2; .... string fieldn = n; } </code></pre> vs <pre class="prettyprint"><code>message myMessage{ SubMessage1 groupedfieldsbasedonsomebusinesslogic1 = 1; SubMessage2 groupedfieldsbasedonsomebusinesslogic2 = 2; message SubMessage1{ string field1 = 1; string field2 = 2; ... string fieldx = x; } message SubMessage2{ string fieldxplus1 = x+1; ... string fieldn = n; } } </code></pre> I am not considering readability so much here as there are pros and cons when deserializing to have flat data or nested data. My question is really focus on the technical impacts.

There is no "best" - everything is contextual, and only you have most of the context. However! Some minor thoughts on performance: <ul> <li>a nested approach requires more objects; usually this is fine, unless your volumes are huge</li> <li>a nested approach may make it easier to understand the object model and the relationships between certain parts of the data</li> <li>a flat approach requires larger field numbers; field numbers 1-15 take a single byte header; field numbers 16-2047 require 2 bytes header (and so on); in reality this extra byte for a few fields is unlikely to hurt you much, and is offset by the overhead of the alternative (nested) approach:</li> <li>a nested approach requires a length-prefix per sub-object, or a start/end token ("group" in the protocol); this isn't much in terms of extra size, but: <ul> <li>length-prefixe requires the serializer to know the length in advance, which means either double-processing (a "compute length" sweep), or buffering; in most cases this isn't a big issue, but it may be problematic for very large sub-graphs</li> <li>start/end tokens are something that google has been trying to kill, and is not well supported in all libraries (and IIRC it doesn't exist in "proto3" schemas); I still really like it though, in some cases :) protobuf-net (from the tags) supports the ability to encode arbitrary sub-data as groups, but it might be awkward if you need to x-plat later</li> </ul> </li> </ul> Out of all of these things, the one that I would focus on if it was me is the second one. Perhaps start with something that looks usable, and measure it for realistic data volumes; does it perform acceptably?

Is it best to have many field in protobuf message or nested messages?

Tags:

database-design

data-modeling

protocol-buffers

protobuf-net

I tried to find some recommendations on the web but could not find anything relevant.

Let's say that I am creating a protocol buffer message that will contain a lot of fields (50+). Is it best to keep all the fields at the same level or to organize them in sub-messages? Is there any impacts on performances for one way or another?

Example:

message myMessage{
 string field1 = 1;
 string field2 = 2;
 ....
 string fieldn = n;
}

message myMessage{
 SubMessage1 groupedfieldsbasedonsomebusinesslogic1 = 1;
 SubMessage2 groupedfieldsbasedonsomebusinesslogic2 = 2;

 message SubMessage1{
  string field1 = 1;
  string field2 = 2;
  ... 
  string fieldx = x;
 } 

 message SubMessage2{
  string fieldxplus1 = x+1;
  ... 
  string fieldn = n;
 }
}

I am not considering readability so much here as there are pros and cons when deserializing to have flat data or nested data. My question is really focus on the technical impacts.

442

asked Apr 05 '18 10:04

Antoine Lefebvre

1 Answers

There is no "best" - everything is contextual, and only you have most of the context.

However! Some minor thoughts on performance:

a nested approach requires more objects; usually this is fine, unless your volumes are huge
a nested approach may make it easier to understand the object model and the relationships between certain parts of the data
a flat approach requires larger field numbers; field numbers 1-15 take a single byte header; field numbers 16-2047 require 2 bytes header (and so on); in reality this extra byte for a few fields is unlikely to hurt you much, and is offset by the overhead of the alternative (nested) approach:
a nested approach requires a length-prefix per sub-object, or a start/end token ("group" in the protocol); this isn't much in terms of extra size, but:
- length-prefixe requires the serializer to know the length in advance, which means either double-processing (a "compute length" sweep), or buffering; in most cases this isn't a big issue, but it may be problematic for very large sub-graphs
- start/end tokens are something that google has been trying to kill, and is not well supported in all libraries (and IIRC it doesn't exist in "proto3" schemas); I still really like it though, in some cases :) protobuf-net (from the tags) supports the ability to encode arbitrary sub-data as groups, but it might be awkward if you need to x-plat later

Out of all of these things, the one that I would focus on if it was me is the second one.

Perhaps start with something that looks usable, and measure it for realistic data volumes; does it perform acceptably?

answered Dec 09 '22 18:12

Marc Gravell

Related questions
                            
                                Why using anything else but VARCHAR2(4000) to store strings in an Oracle database?
                            
                                parent->child relationships in appengine python (bigtable)
                            
                                What is the best database schema for an availability calendar that allows scheduling appointments(reoccurring and single))
                            
                                MySQL Double Entry Accounting System Database Design?
                            
                                Modeling 3 entities with relationships
                            
                                What is better- Add an optional parameter to an existing SP or add a new SP?
                            
                                MySQL Database design - Storing Images - Single table or multiple tables
                            
                                What is a good way to implement an agile database process, which is in synch with the code base, especially in regards to continuous integration? [closed]
                            
                                Can SQL Server views have primary and foreign keys?
                            
                                postgresql: foreign key to either tableA or tableB
                            
                                How to model complex data
                            
                                Indexing & alternatives for low-selectivity columns
                            
                                Photos Gallery - Database Design
                            
                                Storing a directed graph in google appengine datastore
                            
                                DB Design Question - Storing International People Names
                            
                                SET IDENTITY_INSERT ON/OFF needed on application server, but ALTER permission seems dangerous. Suggestion?
                            
                                One database or many?
                            
                                How to deal with old, obsolete database data of a long running system?
                            
                                What is a good approach for Database design for hours of operation?
                            
                                Design users table for single sign in to use across sub domains

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With