There appear to be two different ways to convert a string to bytes, as seen in the answers to TypeError: 'str' does not support the buffer interface Which of these methods would be better or more Pythonic? Or is it just a matter of personal preference? <pre class="prettyprint"><code>b = bytes(mystring, 'utf-8') b = mystring.encode('utf-8') </code></pre>

If you look at the docs for <code>bytes</code>, it points you to <code>bytearray</code>: <blockquote> bytearray([source[, encoding[, errors]]]) Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods. The optional source parameter can be used to initialize the array in a few different ways: If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode(). If it is an integer, the array will have that size and will be initialized with null bytes. If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array. If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array. Without an argument, an array of size 0 is created. </blockquote> So <code>bytes</code> can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense. For encoding a string, I think that <code>some_string.encode(encoding)</code> is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than <code>bytes(some_string, encoding)</code> -- there is no explicit verb when you use the constructor. I checked the Python source. If you pass a unicode string to <code>bytes</code> using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of <code>encode</code>; so you're just skipping a level of indirection if you call <code>encode</code> yourself. Also, see Serdalis' comment -- <code>unicode_string.encode(encoding)</code> is also more Pythonic because its inverse is <code>byte_string.decode(encoding)</code> and symmetry is nice.

Best way to convert string to bytes in Python 3?

Tags:

python

string

python-3.x

character-encoding

There appear to be two different ways to convert a string to bytes, as seen in the answers to TypeError: 'str' does not support the buffer interface

Which of these methods would be better or more Pythonic? Or is it just a matter of personal preference?

b = bytes(mystring, 'utf-8')  b = mystring.encode('utf-8')

734

asked Sep 28 '11 15:09

Mark Ransom

2 Answers

If you look at the docs for bytes, it points you to bytearray:

bytearray([source[, encoding[, errors]]])

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().

If it is an integer, the array will have that size and will be initialized with null bytes.

If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.

If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

So bytes can do much more than just encode a string. It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.

For encoding a string, I think that some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than bytes(some_string, encoding) -- there is no explicit verb when you use the constructor.

I checked the Python source. If you pass a unicode string to bytes using CPython, it calls PyUnicode_AsEncodedString, which is the implementation of encode; so you're just skipping a level of indirection if you call encode yourself.

Also, see Serdalis' comment -- unicode_string.encode(encoding) is also more Pythonic because its inverse is byte_string.decode(encoding) and symmetry is nice.

answered Oct 03 '22 04:10

agf

It's easier than it is thought:

my_str = "hello world" my_str_as_bytes = str.encode(my_str) type(my_str_as_bytes) # ensure it is byte representation my_decoded_str = my_str_as_bytes.decode() type(my_decoded_str) # ensure it is string representation

answered Oct 03 '22 03:10

hasanatkazmi

Related questions
                            
                                How do I check if a variable exists?
                            
                                How do I find the location of my Python site-packages directory?
                            
                                Relative imports for the billionth time
                            
                                How to get line count of a large file cheaply in Python?
                            
                                How to read a text file into a string variable and strip newlines?
                            
                                Does Django scale? [closed]
                            
                                Relative imports in Python 3
                            
                                Create a Pandas Dataframe by appending one row at a time
                            
                                Why do people write #!/usr/bin/env python on the first line of a Python script?
                            
                                How to reverse a list?
                            
                                How can I sort a dictionary by key?
                            
                                How to add a new column to an existing DataFrame?
                            
                                If Python is interpreted, what are .pyc files?
                            
                                Is there a built-in function to print all the current properties and values of an object?
                            
                                Get a list from Pandas DataFrame column headers
                            
                                How do I create a constant in Python?
                            
                                Correct way to write line to file?
                            
                                How do I connect to a MySQL Database in Python?
                            
                                What is the purpose of the word 'self'?
                            
                                Removing duplicates in lists

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With