How to use Python and Google's Protocol Buffers to deserialize data sent over TCP

Tags:

I'm trying to write an application which uses Google's protocol buffers to deserialize data (sent from another application using protocol buffers) over a TCP connection. The problem is that it looks as if protocol buffers in Python can only deserialize data from a string. Since TCP doesn't have well-defined message boundaries and one of the messages I'm trying to receive has a repeated field, I won't know how much data to try and receive before finally passing the string to be deserialized.

Are there any good practices for doing this in Python?

981

asked Jan 10 '10 18:01

Jack Edmonds

2 Answers

Don't just write the serialized data to the socket. First send a fixed-size field containing the length of the serialized object.

The sending side is roughly:

socket.write(struct.pack("H", len(data))    #send a two-byte size field
socket.write(data)

And the recv'ing side becomes something like:

dataToRead = struct.unpack("H", socket.read(2))[0]    
data = socket.read(dataToRead)

This is a common design pattern for socket programming. Most designs extend the over-the-wire structure to include a type field as well, so your receiving side becomes something like:

type = socket.read(1)                                 # get the type of msg
dataToRead = struct.unpack("H", socket.read(2))[0]    # get the len of the msg
data = socket.read(dataToRead)                        # read the msg

if TYPE_FOO == type:
    handleFoo(data)

elif TYPE_BAR == type:
    handleBar(data)

else:
    raise UnknownTypeException(type)

You end up with an over-the-wire message format that looks like:

struct {
     unsigned char type;
     unsigned short length;
     void *data;
}

This does a reasonable job of future-proofing the wire protocol against unforeseen requirements. It's a Type-Length-Value protocol, which you'll find again and again and again in network protocols.

119

answered Oct 04 '22 00:10

J.J.

to expand on J.J.'s (entirely correct) answer, the protobuf library has no way to work out how long messages are on their own, or to work out what type of protobuf object is being sent*. So the other application that's sending you data must already be doing something like this.

When I had to do this, I implemented a lookup table:

messageLookup={0:foobar_pb2.MessageFoo,1:foobar_pb2.MessageBar,2:foobar_pb2.MessageBaz}

...and did essentially what J.J. did, but I also had a helper function:

    def parseMessage(self,msgType,stringMessage):
        msgClass=messageLookup[msgType]
        message=msgClass()
        message.ParseFromString(stringMessage)
        return message

...which I called to turn the string into a protobuf object.

(*) I think it's possible to get round this by encapsulating specific messages inside a container message

answered Oct 04 '22 02:10

frymaster

Related questions
                            
                                How to install a missing python package from inside the script that needs it?
                            
                                PyQt4 center window on active screen
                            
                                How to deploy structured Flask app on AWS elastic beanstalk
                            
                                Show the values in the grid using matplotlib
                            
                                stack bar plot in matplotlib and add label to each section
                            
                                Requests - get content-type/size without fetching the whole page/content
                            
                                How to obtain current instance ID from boto3?
                            
                                Proper way to bulk_create for ManyToMany field, Django?
                            
                                Convert datetime columns to a different timezone pandas
                            
                                Why is calling float() on a number slower than adding 0.0 in Python?
                            
                                Python. Get structure from a data.frame
                            
                                Sampling one record per unique value (pandas, python)
                            
                                Python matplotlib install issue on Windows 7 for freetype, png packages
                            
                                Applying Format to Entire Row Openpyxl
                            
                                Write a 2d array to a csv file with delimiter [duplicate]
                            
                                Accessing Matplotlib Text Object Label Text
                            
                                Does Python evaluate type hinting of a forward reference?
                            
                                'poetry install' command fails; *.whl files are not found
                            
                                How to specify uniqueness for a tuple of field in a Django model
                            
                                Add timeout argument to python's Queue.join()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use Python and Google's Protocol Buffers to deserialize data sent over TCP

Tags:

python

tcp

protocol-buffers

Jack Edmonds

People also ask

2 Answers

J.J.

frymaster

Recent Activity

Donate For Us