Protocol buffers python - unicode decode error

Question

I need to receive a protocol buffers message on my python - tornado server and get the stuff out of the binary message.

postContent = self.request.body
message = prototemp.ReqMessage()
message.ParseFromString(postContent)

It works perfectly using a test tool. When i run it in sandbox environment and simulate 1000 requests from my client, it works in certain cases, but in most of the requests, it throws an exception -

  File "server1.py", line 21, in post
    message.ParseFromString(postContent)
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/message.py", line 179, in ParseFromString
    self.MergeFromString(serialized)
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 755, in MergeFromString
    if self._InternalParse(serialized, 0, length) != length:
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse
    pos = field_decoder(buffer, new_pos, end, self, field_dict)
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 544, in DecodeField
    if value._InternalParse(buffer, pos, new_pos) != new_pos:
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse
    pos = field_decoder(buffer, new_pos, end, self, field_dict)
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 410, in DecodeField
    field_dict[key] = local_unicode(buffer[pos:new_pos], 'utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 1: invalid continuation byte

In some other cases it gives these errors -

UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 3: invalid start byte

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 3: unexpected end of data

What could be the reason ?

user2472341 · Accepted Answer

I had exactly same problem with RabbitMQ and Protocol Buffers. The problem is that protocol buffer assumes the input to be of type str, whereas RabbitMQ seems to decode the message as unicode in some cases (if the byte array contains bytes greater than 127). The same may happen with Tornado as well. So far it seems, that the problem can be solved by following piece of code:

body = self.request.body
if type(body) == unicode:
    data = bytearray(body, "utf-8")
    body = bytes(data)
message = whatever.FromString(body)

This code turns the unicode string to python bytes object, which can be happily parsed by protocol buffer messages. Dunno if there is some better way to do this, but at least this seems to work.

Protocol buffers python - unicode decode error

Tags:

python

unicode

utf-8

protocol-buffers

Aditya Singh

1 Answers

user2472341

Recent Activity

Donate For Us

Protocol buffers python - unicode decode error

Tags:

python

unicode

utf-8

protocol-buffers

Aditya Singh

1 Answers

user2472341

Related questions

Recent Activity

Donate For Us