Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How cross-platform is Google's Protocol Buffer's handling of floating-point types in practice?

Tags:

Google's Protocol Buffers allows you to store floats and doubles in messages. I looked through the implementation source code wondering how they managed to do this in a cross-platform manner, and what I stumbled upon was:

inline uint32 WireFormatLite::EncodeFloat(float value) {   union {float f; uint32 i;};   f = value;   return i; }  inline float WireFormatLite::DecodeFloat(uint32 value) {   union {float f; uint32 i;};   i = value;   return f; }  inline uint64 WireFormatLite::EncodeDouble(double value) {   union {double f; uint64 i;};   f = value;   return i; }  inline double WireFormatLite::DecodeDouble(uint64 value) {   union {double f; uint64 i;};   i = value;   return f; } 

Now, an important additional piece of information is that these routines are not the end of the process but rather the result of them is post-processed to put the bytes in little-endian order:

inline void WireFormatLite::WriteFloatNoTag(float value,                                         io::CodedOutputStream* output) {   output->WriteLittleEndian32(EncodeFloat(value)); }  inline void WireFormatLite::WriteDoubleNoTag(double value,                                          io::CodedOutputStream* output) {   output->WriteLittleEndian64(EncodeDouble(value)); }  template <> inline bool WireFormatLite::ReadPrimitive<float, WireFormatLite::TYPE_FLOAT>(     io::CodedInputStream* input,     float* value) {   uint32 temp;   if (!input->ReadLittleEndian32(&temp)) return false;   *value = DecodeFloat(temp);   return true; }  template <> inline bool WireFormatLite::ReadPrimitive<double, WireFormatLite::TYPE_DOUBLE>(     io::CodedInputStream* input,     double* value) {   uint64 temp;   if (!input->ReadLittleEndian64(&temp)) return false;   *value = DecodeDouble(temp);   return true; } 

So my question is: is this really good enough in practice to ensure that the serialization of floats and doubles in C++ will be transportable across platforms?

I am explicitly inserting the words "in practice" in my question because I am aware that in theory one cannot make any assumptions about how floats and doubles are actually formatted in C++, but I don't have a sense of whether this theoretical danger is actually something I should be very worried about in practice.

UPDATE

It now looks to me like the approach PB takes might be broken on SPARC. If I understand this page by Oracle describing the format used for number on SPARC correctly, the SPARC uses the opposite endian as x86 for integers but the same endian as x86 for floats and doubles. However, PB encodes floats/doubles by first casting them directly to an integer type of the appropriate size (via means of a union; see the snippets of code quoted in my question above), and then reversing the order of the bytes on platforms with big-endian integers:

void CodedOutputStream::WriteLittleEndian64(uint64 value) {   uint8 bytes[sizeof(value)];    bool use_fast = buffer_size_ >= sizeof(value);   uint8* ptr = use_fast ? buffer_ : bytes;    WriteLittleEndian64ToArray(value, ptr);    if (use_fast) {     Advance(sizeof(value));   } else {     WriteRaw(bytes, sizeof(value));   } }  inline uint8* CodedOutputStream::WriteLittleEndian64ToArray(uint64 value,                                                             uint8* target) { #if defined(PROTOBUF_LITTLE_ENDIAN)   memcpy(target, &value, sizeof(value)); #else   uint32 part0 = static_cast<uint32>(value);   uint32 part1 = static_cast<uint32>(value >> 32);    target[0] = static_cast<uint8>(part0);   target[1] = static_cast<uint8>(part0 >>  8);   target[2] = static_cast<uint8>(part0 >> 16);   target[3] = static_cast<uint8>(part0 >> 24);   target[4] = static_cast<uint8>(part1);   target[5] = static_cast<uint8>(part1 >>  8);   target[6] = static_cast<uint8>(part1 >> 16);   target[7] = static_cast<uint8>(part1 >> 24); #endif   return target + sizeof(value); } 

This, however, is exactly the wrong thing for it to be doing in the case of floats/doubles on SPARC since the bytes are already in the "correct" order.

So in conclusion, if my understanding is correct then floating point numbers are not transportable between SPARC and x86 using PB, because essentially PB assumes that all numbers are stored with the same endianess (relative to other platforms) as the integers on a given platform, which is an incorrect assumption to make on SPARC.

UPDATE 2

As Lyke pointed out, IEEE 64-bit floating points are stored in big-endian order on SPARC, in contrast to x86. However, only the two 32-bit words are in reverse order, not all 8 of the bytes, and in particular IEEE 32-bit floating points look like they are stored in the same order as on x86.

like image 305
Gregory Crosswhite Avatar asked Aug 30 '11 19:08

Gregory Crosswhite


People also ask

How does Google Protobuf work?

The Protobuf is a binary transfer format, meaning the data is transmitted as a binary. This improves the speed of transmission more than the raw string because it takes less space and bandwidth. Since the data is compressed, the CPU usage will also be less.

What is the protocol of Google?

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.

Is Protobuf forward compatible?

So long as you're careful about when and how you change and remove fields, your protobuf will be forward and backward compatible.

Does Protobuf use HTTP?

Protobufs work fine over HTTP in their native binary format.


2 Answers

I think it should be fine so long as your target C++ platform uses IEEE-754 and the library handles the endianness properly. Basically the code you've shown is assuming that if you've got the right bits in the right order and an IEEE-754 implementation, you'll get the right value. The endianness is handled by protocol buffers, and the IEEE-754-ness is assumed - but pretty universal.

like image 163
Jon Skeet Avatar answered Nov 02 '22 06:11

Jon Skeet


In practice, the fact that they are writing and reading with the endianness enforced is enough to maintain portability. This is fairly evident, considering the widespread use of Protocol Buffers across many platforms (and even languages).

like image 24
Reed Copsey Avatar answered Nov 02 '22 05:11

Reed Copsey