I need to receive a protocol buffers message on my python - tornado server and get the stuff out of the binary message.
postContent = self.request.body
message = prototemp.ReqMessage()
message.ParseFromString(postContent)
It works perfectly using a test tool. When i run it in sandbox environment and simulate 1000 requests from my client, it works in certain cases, but in most of the requests, it throws an exception -
File "server1.py", line 21, in post
message.ParseFromString(postContent)
File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/message.py", line 179, in ParseFromString
self.MergeFromString(serialized)
File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 755, in MergeFromString
if self._InternalParse(serialized, 0, length) != length:
File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse
pos = field_decoder(buffer, new_pos, end, self, field_dict)
File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 544, in DecodeField
if value._InternalParse(buffer, pos, new_pos) != new_pos:
File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse
pos = field_decoder(buffer, new_pos, end, self, field_dict)
File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 410, in DecodeField
field_dict[key] = local_unicode(buffer[pos:new_pos], 'utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 1: invalid continuation byte
In some other cases it gives these errors -
UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 3: invalid start byte
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 3: unexpected end of data
What could be the reason ?
I had exactly same problem with RabbitMQ and Protocol Buffers. The problem is that protocol buffer assumes the input to be of type str, whereas RabbitMQ seems to decode the message as unicode in some cases (if the byte array contains bytes greater than 127). The same may happen with Tornado as well. So far it seems, that the problem can be solved by following piece of code:
body = self.request.body
if type(body) == unicode:
data = bytearray(body, "utf-8")
body = bytes(data)
message = whatever.FromString(body)
This code turns the unicode string to python bytes object, which can be happily parsed by protocol buffer messages. Dunno if there is some better way to do this, but at least this seems to work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With