We're thinking of using Protocol Buffers for binary logging because:
That said, it isn't obvious how we should go about it because the APIs tend to focus on creating whole objects, so wrapping a list of DataLogEntry as a repeated field in a DataLogFile would be what you'd do in messaging terms but what we really want is just to be able to write and then read a whole DataLogEntry out, appending it to the end of a file.
The first issue we've hit by doing that is that doing this (in a test:
FileInputStream fileIn = new FileInputStream(logFile);
CodedInputStream in = CodedInputStream.newInstance(fileIn);
while(!in.isAtEnd()) {
DataLogEntry entry = DataLogEntry.parseFrom(in);
// ... do stuff
}
Only results in 1 DataLogEntry being read from the stream. Without the isAtEnd, it never stops.
Thoughts?
Edit: I've switched to using entry.writeDelimitedTo and BidLogEntry.parseDelimitedFrom and that seems to work...
Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data.
Protocol buffers, or Protobuf, is a binary format created by Google to serialize data between different services. Google made this protocol open source and now it provides support, out of the box, to the most common languages, like JavaScript, Java, C#, Ruby and others.
The Protobuf is a binary transfer format, meaning the data is transmitted as a binary. This improves the speed of transmission more than the raw string because it takes less space and bandwidth. Since the data is compressed, the CPU usage will also be less.
TL;DR — encoding and decoding string-intensive data in JavaScript is faster with JSON than it is with protobuf. When you have structured data in JavaScript, which needs to be sent over the network (for another microservice for example) or saved into a storage system, it first needs to be serialized.
From my understanding of protocol buffers it does not support multiple messages in a single stream. So you will probably need to track the boundaries of the messages yourself. you can do this by storing the size of the message before each message in the log.
public class DataLog {
public void write(final DataOutputStream out, final DataLogEntry entry) throws IOException {
out.writeInt(entry.getSerializedSize());
CodedOutputStream codedOut = CodedOutputStream.newInstance(out);
entry.writeTo(codedOut);
codedOut.flush();
}
public void read(final DataInputStream in) throws IOException {
byte[] buffer = new byte[4096];
while (true) {
try {
int size = in.readInt();
CodedInputStream codedIn;
if (size <= buffer.length) {
in.read(buffer, 0, size);
codedIn = CodedInputStream.newInstance(buffer, 0, size);
} else {
byte[] tmp = new byte[size];
in.read(tmp);
codedIn = CodedInputStream.newInstance(tmp);
}
DataLogEntry.parseFrom(codedIn);
// ... do stuff
}
catch (final EOFException e) {
break;
}
}
}
}
NB: I've used an EOFException to find the end of file, you may wish to use a delimiter or track the number of byte read manually.
As of 2.4.0a, at least, this is easy. Write your message with writeDelimitedTo. No need to use the coded streams directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With