Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert between bytes and strings in Python 3?

This is a Python 101 type question, but it had me baffled for a while when I tried to use a package that seemed to convert my string input into bytes.

As you will see below I found the answer for myself, but I felt it was worth recording here because of the time it took me to unearth what was going on. It seems to be generic to Python 3, so I have not referred to the original package I was playing with; it does not seem to be an error (just that the particular package had a .tostring() method that was clearly not producing what I understood as a string...)

My test program goes like this:

import mangler                                 # spoof package  stringThing = """ <Doc>     <Greeting>Hello World</Greeting>     <Greeting>你好</Greeting> </Doc> """  # print out the input print('This is the string input:') print(stringThing)  # now make the string into bytes bytesThing = mangler.tostring(stringThing)    # pseudo-code again  # now print it out print('\nThis is the bytes output:') print(bytesThing) 

The output from this code gives this:

This is the string input:  <Doc>     <Greeting>Hello World</Greeting>     <Greeting>你好</Greeting> </Doc>   This is the bytes output: b'\n<Doc>\n    <Greeting>Hello World</Greeting>\n    <Greeting>\xe4\xbd\xa0\xe5\xa5\xbd</Greeting>\n</Doc>\n' 

So, there is a need to be able to convert between bytes and strings, to avoid ending up with non-ascii characters being turned into gobbledegook.

like image 503
Bobble Avatar asked Dec 23 '12 11:12

Bobble


People also ask

Which method is used to convert raw byte data to a string in Python?

Similarly, Decoding is process to convert a Byte object to String. It is implemented using decode() . A byte string can be decoded back into a character string, if you know which encoding was used to encode it.

What is b '\ x00 in Python?

b means bytes , not binary. \x00 is not string 0 but char with code 0 which can't be displayed so Python shows its code. – furas.

How do you decode bytes in Python?

Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.


Video Answer


2 Answers

The 'mangler' in the above code sample was doing the equivalent of this:

bytesThing = stringThing.encode(encoding='UTF-8') 

There are other ways to write this (notably using bytes(stringThing, encoding='UTF-8'), but the above syntax makes it obvious what is going on, and also what to do to recover the string:

newStringThing = bytesThing.decode(encoding='UTF-8') 

When we do this, the original string is recovered.

Note, using str(bytesThing) just transcribes all the gobbledegook without converting it back into Unicode, unless you specifically request UTF-8, viz., str(bytesThing, encoding='UTF-8'). No error is reported if the encoding is not specified.

like image 135
Bobble Avatar answered Oct 03 '22 22:10

Bobble


In python3, there is a bytes() method that is in the same format as encode().

str1 = b'hello world' str2 = bytes("hello world", encoding="UTF-8") print(str1 == str2) # Returns True 

I didn't read anything about this in the docs, but perhaps I wasn't looking in the right place. This way you can explicitly turn strings into byte streams and have it more readable than using encode and decode, and without having to prefex b in front of quotes.

like image 24
NuclearPeon Avatar answered Oct 04 '22 00:10

NuclearPeon