Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

decode content while reading from socket in Python

Assume I read some content from socket in Python and have to decode it to UTF-8 on-the-fly.

I can not afford to keep all the content in memory, so I must decode it as I receive and save to file.

It can happen, that I will only receive partial bytes of character, (€-sign is represented by three bytes for example in Python as '\xe2\x82\xac').

Assume I have received only the first two bytes (\xe2\x82), if I try to decode it, I'm getting 'UnicodeDecodeError', as expected.

I could always try to decode the current content and check if it throws an Exception

  • But how reliable is this approach?
  • How can I know or determine if I can decode the current content?
  • How to do it correct?

Thanks

like image 699
user2624744 Avatar asked May 15 '26 22:05

user2624744


1 Answers

Guido's time machine strikes again.

>>> dec = codecs.getincrementaldecoder('utf-8')()
>>> dec.decode('foo\xe2\x82')
u'foo'
>>> dec.decode('\xac')
u'\u20ac'
like image 84
Ignacio Vazquez-Abrams Avatar answered May 17 '26 12:05

Ignacio Vazquez-Abrams



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!