Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using protocol buffers for binary logging

We're thinking of using Protocol Buffers for binary logging because:

  • It's how we're encoding our objects anyway
  • It is relatively compact, fast to read / write etc.

That said, it isn't obvious how we should go about it because the APIs tend to focus on creating whole objects, so wrapping a list of DataLogEntry as a repeated field in a DataLogFile would be what you'd do in messaging terms but what we really want is just to be able to write and then read a whole DataLogEntry out, appending it to the end of a file.

The first issue we've hit by doing that is that doing this (in a test:

        FileInputStream fileIn = new FileInputStream(logFile);
        CodedInputStream in = CodedInputStream.newInstance(fileIn);
        while(!in.isAtEnd()) {
            DataLogEntry entry = DataLogEntry.parseFrom(in);
            // ... do stuff
        }

Only results in 1 DataLogEntry being read from the stream. Without the isAtEnd, it never stops.

Thoughts?

Edit: I've switched to using entry.writeDelimitedTo and BidLogEntry.parseDelimitedFrom and that seems to work...

like image 836
Jamie McCrindle Avatar asked Mar 10 '10 17:03

Jamie McCrindle


People also ask

What are Protobuf Protocol Buffers useful for?

Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data.

Is Protobuf a binary?

Protocol buffers, or Protobuf, is a binary format created by Google to serialize data between different services. Google made this protocol open source and now it provides support, out of the box, to the most common languages, like JavaScript, Java, C#, Ruby and others.

How do protocol buffers work?

The Protobuf is a binary transfer format, meaning the data is transmitted as a binary. This improves the speed of transmission more than the raw string because it takes less space and bandwidth. Since the data is compressed, the CPU usage will also be less.

Is Protobuf faster than JSON?

TL;DR — encoding and decoding string-intensive data in JavaScript is faster with JSON than it is with protobuf. When you have structured data in JavaScript, which needs to be sent over the network (for another microservice for example) or saved into a storage system, it first needs to be serialized.


2 Answers

From my understanding of protocol buffers it does not support multiple messages in a single stream. So you will probably need to track the boundaries of the messages yourself. you can do this by storing the size of the message before each message in the log.

public class DataLog {

    public void write(final DataOutputStream out, final DataLogEntry entry) throws IOException {
        out.writeInt(entry.getSerializedSize());
        CodedOutputStream codedOut = CodedOutputStream.newInstance(out);
        entry.writeTo(codedOut);
        codedOut.flush();
    }

    public void read(final DataInputStream in) throws IOException {
        byte[] buffer = new byte[4096];
        while (true) {
            try {
                int size = in.readInt();
                CodedInputStream codedIn;
                if (size <= buffer.length) {
                    in.read(buffer, 0, size);
                    codedIn = CodedInputStream.newInstance(buffer, 0, size);
                } else {
                    byte[] tmp = new byte[size];
                    in.read(tmp);
                    codedIn = CodedInputStream.newInstance(tmp);
                }
                DataLogEntry.parseFrom(codedIn);
                // ... do stuff
            }
            catch (final EOFException e) {
                break;
            }
        }
    }
}

NB: I've used an EOFException to find the end of file, you may wish to use a delimiter or track the number of byte read manually.

like image 61
Michael Barker Avatar answered Oct 14 '22 18:10

Michael Barker


As of 2.4.0a, at least, this is easy. Write your message with writeDelimitedTo. No need to use the coded streams directly.

like image 44
bmargulies Avatar answered Oct 14 '22 17:10

bmargulies