I've been reading up on Google Protocol Buffers recently, which allows for a variety of scalar value types to be used in messages. According to their documentation, there's three types of variable-length integer primitives - <code>int32</code>, <code>uint32</code>, and <code>sint32</code>. In their documentation, they note that <code>int32</code> is "Inefficient for encoding negative numbers – if your field is likely to have negative values, use <code>sint32</code> instead." But if you have a field that has no negative numbers, I assume that uint32 would be a better type to use than <code>int32</code> anyways (due to the extra bit and decreased CPU cost of processing negative numbers). So when would <code>int32</code> be a good scalar to use? Is the documentation implying that it's most efficient only when you rarely get negative numbers? Or is it always preferable to use <code>sint32</code> and <code>uint32</code>, depending on the contents of the field? (The same questions apply to the 64-bit versions of these scalars as well: <code>int64</code>, <code>uint64</code>, and <code>sint64</code>; but I left them out of the problem description for readability's sake.)

I'm not familiar with Google Protocol Buffers, but my interpretation of the documentation is: <ul> <li>use <code>uint32</code> if the value cannot be negative</li> <li>use <code>sint32</code> if the value is pretty much as likely to be negative as not (for some fuzzy definition of "as likely to be")</li> <li>use <code>int32</code> if the value could be negative, but that's much less likely than the value being positive (for example, if the application sometimes uses -1 to indicate an error or 'unknown' value and this is a relatively uncommon situation)</li> </ul> Here's what the docs have to say about the encodings (http://code.google.com/apis/protocolbuffers/docs/encoding.html#types): <blockquote> there is an important difference between the signed int types (<code>sint32</code> and <code>sint64</code>) and the "standard" int types (<code>int32</code> and <code>int64</code>) when it comes to encoding negative numbers. If you use <code>int32</code> or <code>int64</code> as the type for a negative number, the resulting <code>varint</code> is always ten bytes long – it is, effectively, treated like a very large unsigned integer. If you use one of the signed types, the resulting <code>varint</code> uses ZigZag encoding, which is much more efficient. ZigZag encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small <code>varint</code> encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers, so that -1 is encoded as 1, 1 is encoded as 2, -2 is encoded as 3, and so on... </blockquote> So it looks like even if your use of negative numbers is rare, as long as the magnitude of the numbers (including non-negative numbers) you're passing in the protocol is on the smaller side, you might be better off using <code>sint32</code>. If you're unsure, profiling would be in order.

Is there ever a good time to use int32 instead of sint32 in Google Protocol Buffers?

Tags:

protocol-buffers

primitive

I've been reading up on Google Protocol Buffers recently, which allows for a variety of scalar value types to be used in messages.

According to their documentation, there's three types of variable-length integer primitives - int32, uint32, and sint32. In their documentation, they note that int32 is "Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead." But if you have a field that has no negative numbers, I assume that uint32 would be a better type to use than int32 anyways (due to the extra bit and decreased CPU cost of processing negative numbers).

So when would int32 be a good scalar to use? Is the documentation implying that it's most efficient only when you rarely get negative numbers? Or is it always preferable to use sint32 and uint32, depending on the contents of the field?

(The same questions apply to the 64-bit versions of these scalars as well: int64, uint64, and sint64; but I left them out of the problem description for readability's sake.)

926

asked Apr 19 '09 19:04

Dan Lew

1 Answers

I'm not familiar with Google Protocol Buffers, but my interpretation of the documentation is:

use uint32 if the value cannot be negative
use sint32 if the value is pretty much as likely to be negative as not (for some fuzzy definition of "as likely to be")
use int32 if the value could be negative, but that's much less likely than the value being positive (for example, if the application sometimes uses -1 to indicate an error or 'unknown' value and this is a relatively uncommon situation)

Here's what the docs have to say about the encodings (http://code.google.com/apis/protocolbuffers/docs/encoding.html#types):

there is an important difference between the signed int types (sint32 and sint64) and the "standard" int types (int32 and int64) when it comes to encoding negative numbers. If you use int32 or int64 as the type for a negative number, the resulting varint is always ten bytes long – it is, effectively, treated like a very large unsigned integer. If you use one of the signed types, the resulting varint uses ZigZag encoding, which is much more efficient.

ZigZag encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small varint encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers, so that -1 is encoded as 1, 1 is encoded as 2, -2 is encoded as 3, and so on...

So it looks like even if your use of negative numbers is rare, as long as the magnitude of the numbers (including non-negative numbers) you're passing in the protocol is on the smaller side, you might be better off using sint32. If you're unsure, profiling would be in order.

answered Sep 24 '22 04:09

Michael Burr

Related questions
                            
                                Is there a good C implementation of Google Protocol Buffers
                            
                                Google protocol buffers on iOS
                            
                                Importing caffe results in ImportError: "No module named google.protobuf.internal" (import enum_type_wrapper)
                            
                                protocol buffers - store an double array, 1D, 2D and 3D
                            
                                How to solve the issue with Dalvik compiler limitation on 64K methods?
                            
                                Examining a protobuf message - how to get field values by name?
                            
                                Can protobuf service method return primitive type?
                            
                                Correct format of protoc go_package?
                            
                                raw decoder for protobufs format
                            
                                How to dynamically build a new protobuf from a set of already defined descriptors?
                            
                                google protobuf maximum size
                            
                                Is there a standard mapping between JSON and Protocol Buffers?
                            
                                Import and usage of different package files in protobuf?
                            
                                how do has_field() methods relate to default values in protobuf?
                            
                                Zig Zag Decoding
                            
                                golang protobuf remove omitempty tag from generated json tags
                            
                                No module named google.protobuf
                            
                                How to add global exception interceptor in gRPC server?
                            
                                How do I share Protocol Buffer .proto files between multiple repositories
                            
                                How do you add a repeated field using Google's Protocol Buffer in C++?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With