Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Protobuf ByteString vs. Byte[]

I am working with google protobuf in Java. I see that it is possible to serialize a protobuf message to String, byte[], ByteString, etc: (Source: https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite)

I don't know what a ByteString is. I got the following definition from the the protobuf API documentation (source: https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/ByteString): "Immutable sequence of bytes. Substring is supported by sharing the reference to the immutable underlying bytes, as with String."

It is not clear to me how a ByteString is different from a String or byte[]. Can somebody please explain? Thanks.

like image 335
Rahim Pirbhai Avatar asked Mar 12 '15 19:03

Rahim Pirbhai


People also ask

What is ByteString?

A byte string is a fixed-length array of bytes. A byte is an exact integer between 0 and 255 inclusive. A byte string can be mutable or immutable.

What is a ByteString in Java?

Encodes text into a sequence of bytes using the named charset and returns the result as a ByteString . static ByteString. copyFrom(java.lang.String text, java.lang.String charsetName) Encodes text into a sequence of bytes using the named charset and returns the result as a ByteString .

Does order matter in Protobuf?

The textual order is largely irrelevant, although it may impact some code generation tooling - but most languages don't care about declaration order, so even that: won't matter.

What is a Protobuf format?

Protocol buffers, or Protobuf, is a binary format created by Google to serialize data between different services. Google made this protocol open source and now it provides support, out of the box, to the most common languages, like JavaScript, Java, C#, Ruby and others.


2 Answers

You can think of ByteString as an immutable byte array. That's pretty much it. It's a byte[] which you can use in a protobuf. Protobuf does not let you use Java arrays because they're mutable.

ByteString exists because String is not suitable for representing arbitrary sequences of bytes. String is specifically for character data.

The protobuf MessageLite Interface provides toByteArray() and toByteString() methods. If ByteString is an immutable byte[], would the byte representation of a message represented by both ByteString and byte[] be the same?

Sort of. If you call toByteArray() you'll get the same value as if you were to call toByteString().toByteArray(). Compare the implementation of the two methods, in AbstractMessageLite:

public ByteString toByteString() {   try {     final ByteString.CodedBuilder out =       ByteString.newCodedBuilder(getSerializedSize());     writeTo(out.getCodedOutput());     return out.build();   } catch (IOException e) {     throw new RuntimeException(       "Serializing to a ByteString threw an IOException (should " +       "never happen).", e);   } }  public byte[] toByteArray() {   try {     final byte[] result = new byte[getSerializedSize()];     final CodedOutputStream output = CodedOutputStream.newInstance(result);     writeTo(output);     output.checkNoSpaceLeft();     return result;   } catch (IOException e) {     throw new RuntimeException(       "Serializing to a byte array threw an IOException " +       "(should never happen).", e);   } } 
like image 166
Matt Ball Avatar answered Sep 23 '22 19:09

Matt Ball


A ByteString gives you the ability to perform more operations on the underlying data without having to copy the data into a new structure. For instance, if you wanted to provide a subset of bytes in a byte[] to another method, you would need to supply it with a start index and an end index. You can also concatenate ByteStrings without having to create a new data structure and manually copy the data.

However, with a ByteString you can give the method a subset of that data without the method knowing anything about the underlying storage. Just like a a substring of a normal String.

A String is for representing text and is not a good way to store binary data (as not all binary data has a textual equivalent unless you encode it in a manner that does: e.g. hex or Base64).

like image 33
Chris Thompson Avatar answered Sep 24 '22 19:09

Chris Thompson