What is the difference between a string and a byte string?

2 Answers

The only thing that a computer can store is bytes.

To store anything in a computer, you must first encode it, i.e. convert it to bytes. For example:

If you want to store music, you must first encode it using MP3, WAV, etc.
If you want to store a picture, you must first encode it using PNG, JPEG, etc.
If you want to store text, you must first encode it using ASCII, UTF-8, etc.

MP3, WAV, PNG, JPEG, ASCII and UTF-8 are examples of encodings. An encoding is a format to represent audio, images, text, etc in bytes.

In Python, a byte string is just that: a sequence of bytes. It isn't human-readable. Under the hood, everything must be converted to a byte string before it can be stored in a computer.

On the other hand, a character string, often just called a "string", is a sequence of characters. It is human-readable. A character string can't be directly stored in a computer, it has to be encoded first (converted into a byte string). There are multiple encodings through which a character string can be converted into a byte string, such as ASCII and UTF-8.

'I am a string'.encode('ASCII')

The above Python code will encode the string 'I am a string' using the encoding ASCII. The result of the above code will be a byte string. If you print it, Python will represent it as b'I am a string'. Remember, however, that byte strings aren't human-readable, it's just that Python decodes them from ASCII when you print them. In Python, a byte string is represented by a b, followed by the byte string's ASCII representation.

A byte string can be decoded back into a character string, if you know the encoding that was used to encode it.

b'I am a string'.decode('ASCII')

The above code will return the original string 'I am a string'.

Encoding and decoding are inverse operations. Everything must be encoded before it can be written to disk, and it must be decoded before it can be read by a human.

115

answered Sep 23 '22 01:09

Zenadix

Assuming Python 3 (in Python 2, this difference is a little less well-defined) - a string is a sequence of characters, ie unicode codepoints; these are an abstract concept, and can't be directly stored on disk. A byte string is a sequence of, unsurprisingly, bytes - things that can be stored on disk. The mapping between them is an encoding - there are quite a lot of these (and infinitely many are possible) - and you need to know which applies in the particular case in order to do the conversion, since a different encoding may map the same bytes to a different string:

>>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-16') '蓏콯캁澽苏' >>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-8') 'τoρνoς'

Once you know which one to use, you can use the .decode() method of the byte string to get the right character string from it as above. For completeness, the .encode() method of a character string goes the opposite way:

>>> 'τoρνoς'.encode('utf-8') b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'

answered Sep 25 '22 01:09

lvc

Related questions
                            
                                Using pip behind a proxy with CNTLM
                            
                                How do I update a Python package?
                            
                                Python 3: UnboundLocalError: local variable referenced before assignment [duplicate]
                            
                                Python JSON serialize a Decimal object
                            
                                Python "raise from" usage
                            
                                python: How do I know what type of exception occurred?
                            
                                remove None value from a list without removing the 0 value
                            
                                Inserting image into IPython notebook markdown
                            
                                Access multiple elements of list knowing their index
                            
                                Possibilities for Python classes organized across files? [closed]
                            
                                How to find the installed pandas version
                            
                                How does numpy.newaxis work and when to use it?
                            
                                _csv.Error: field larger than field limit (131072)
                            
                                Convert array of indices to 1-hot encoded numpy array
                            
                                Chain-calling parent initialisers in python [duplicate]
                            
                                Create a directly-executable cross-platform GUI app using Python
                            
                                How do I execute a program from Python? os.system fails due to spaces in path
                            
                                What's the best way to parse command line arguments? [closed]
                            
                                Listing contents of a bucket with boto3
                            
                                Failed to install Python Cryptography package with PIP and setup.py

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between a string and a byte string?

Tags:

python

string

character

byte

Sheldon

People also ask

2 Answers

Zenadix

lvc

Recent Activity

Donate For Us