This is a Python 101 type question, but it had me baffled for a while when I tried to use a package that seemed to convert my string input into bytes. As you will see below I found the answer for myself, but I felt it was worth recording here because of the time it took me to unearth what was going on. It seems to be generic to Python 3, so I have not referred to the original package I was playing with; it does not seem to be an error (just that the particular package had a <code>.tostring()</code> method that was clearly not producing what I understood as a string...) My test program goes like this: <pre class="prettyprint"><code>import mangler # spoof package stringThing = """ <Doc> <Greeting>Hello World</Greeting> <Greeting>你好</Greeting> </Doc> """ # print out the input print('This is the string input:') print(stringThing) # now make the string into bytes bytesThing = mangler.tostring(stringThing) # pseudo-code again # now print it out print('\nThis is the bytes output:') print(bytesThing) </code></pre> The output from this code gives this: <pre class="prettyprint"><code>This is the string input: <Doc> <Greeting>Hello World</Greeting> <Greeting>你好</Greeting> </Doc> This is the bytes output: b'\n<Doc>\n <Greeting>Hello World</Greeting>\n <Greeting>\xe4\xbd\xa0\xe5\xa5\xbd</Greeting>\n</Doc>\n' </code></pre> So, there is a need to be able to convert between bytes and strings, to avoid ending up with non-ascii characters being turned into gobbledegook.

The 'mangler' in the above code sample was doing the equivalent of this: <pre class="prettyprint"><code>bytesThing = stringThing.encode(encoding='UTF-8') </code></pre> There are other ways to write this (notably using <code>bytes(stringThing, encoding='UTF-8')</code>, but the above syntax makes it obvious what is going on, and also what to do to recover the string: <pre class="prettyprint"><code>newStringThing = bytesThing.decode(encoding='UTF-8') </code></pre> When we do this, the original string is recovered. Note, using <code>str(bytesThing)</code> just transcribes all the gobbledegook without converting it back into Unicode, unless you specifically request UTF-8, viz., <code>str(bytesThing, encoding='UTF-8')</code>. No error is reported if the encoding is not specified.

How to convert between bytes and strings in Python 3?

Tags:

string

python-3.x

byte

This is a Python 101 type question, but it had me baffled for a while when I tried to use a package that seemed to convert my string input into bytes.

As you will see below I found the answer for myself, but I felt it was worth recording here because of the time it took me to unearth what was going on. It seems to be generic to Python 3, so I have not referred to the original package I was playing with; it does not seem to be an error (just that the particular package had a .tostring() method that was clearly not producing what I understood as a string...)

My test program goes like this:

import mangler                                 # spoof package  stringThing = """ <Doc>     <Greeting>Hello World</Greeting>     <Greeting>你好</Greeting> </Doc> """  # print out the input print('This is the string input:') print(stringThing)  # now make the string into bytes bytesThing = mangler.tostring(stringThing)    # pseudo-code again  # now print it out print('\nThis is the bytes output:') print(bytesThing)

The output from this code gives this:

This is the string input:  <Doc>     <Greeting>Hello World</Greeting>     <Greeting>你好</Greeting> </Doc>   This is the bytes output: b'\n<Doc>\n    <Greeting>Hello World</Greeting>\n    <Greeting>\xe4\xbd\xa0\xe5\xa5\xbd</Greeting>\n</Doc>\n'

So, there is a need to be able to convert between bytes and strings, to avoid ending up with non-ascii characters being turned into gobbledegook.

503

asked Dec 23 '12 11:12

Bobble

Video Answer

2 Answers

The 'mangler' in the above code sample was doing the equivalent of this:

bytesThing = stringThing.encode(encoding='UTF-8')

There are other ways to write this (notably using bytes(stringThing, encoding='UTF-8'), but the above syntax makes it obvious what is going on, and also what to do to recover the string:

newStringThing = bytesThing.decode(encoding='UTF-8')

When we do this, the original string is recovered.

Note, using str(bytesThing) just transcribes all the gobbledegook without converting it back into Unicode, unless you specifically request UTF-8, viz., str(bytesThing, encoding='UTF-8'). No error is reported if the encoding is not specified.

135

answered Oct 03 '22 22:10

Bobble

In python3, there is a bytes() method that is in the same format as encode().

str1 = b'hello world' str2 = bytes("hello world", encoding="UTF-8") print(str1 == str2) # Returns True

I didn't read anything about this in the docs, but perhaps I wasn't looking in the right place. This way you can explicitly turn strings into byte streams and have it more readable than using encode and decode, and without having to prefex b in front of quotes.

answered Oct 04 '22 00:10

NuclearPeon

Related questions
                            
                                How to count of sub-string occurrences? [duplicate]
                            
                                Calculate cosine similarity given 2 sentence strings
                            
                                MySql Query Replace NULL with Empty String in Select
                            
                                Get object property name as a string
                            
                                String.Format - how it works and how to implement custom formatstrings
                            
                                Splitting C++ Strings Onto Multiple Lines (Code Syntax, Not Parsing)
                            
                                Most optimized way of concatenation in strings
                            
                                How do I write a backslash (\) in a string?
                            
                                Convert from lowercase to uppercase all values in all character variables in dataframe
                            
                                How to convert a string to JSON object in PHP
                            
                                How to get ° character in a string in python?
                            
                                How to convert WebResponse.GetResponseStream return into a string?
                            
                                Python add leading zeroes using str.format [duplicate]
                            
                                Format string to a 3 digit number
                            
                                Convert regular Python string to raw string
                            
                                Using quotation marks inside quotation marks
                            
                                How can I format a String number to have commas in android Edit Field
                            
                                How to split String with some separator but without removing that separator in Java? [duplicate]
                            
                                Javascript convert PascalCase to underscore_case/snake_case
                            
                                What does .NET's String.Normalize do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert between bytes and strings in Python 3?

Tags:

string

python-3.x

byte

Bobble

People also ask

Video Answer

2 Answers

Bobble

NuclearPeon

Recent Activity

Donate For Us