I got this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position: 0, invalid start byte
I found this solution:
>>> b"abcde".decode("utf-8")
from here: Convert bytes to a Python string
But how do you use it if a) you don’t know where the 0xff is and/or b) you need to decode a file object? What is the correct syntax / format?
I am parsing through a directory, so I tried going through the files one at a time. (NOTE: This won't work when the project gets larger!!!)
>>> i = "b'0xff'"
>>> with open('firstfile') as f:
... g=f.readlines()
...
>>> i in g
False
>>> 0xff in g
False
>>> '0xff' in g
False
>>> b'0xff' in g
False
>>> with open('secondfile') as f:
<snip - same process>
>>> with open('thirdfile') as f:
... g = f.readlines()
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
So if this is the right file, and if I can't open it with Python (I put it in sublime text, found nothing) how do I decode, or encode, this? Thanks.
The signed byte 0xff represents the value -1 . This is because Java uses two's complement to represent signed values. The signed byte 0xff represents -1 because its most significant bit is 1 (so therefore it represents a negative value) and its value is -128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = -1 .
The Python "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte" occurs when we specify an incorrect encoding when decoding a bytes object. To solve the error, specify the correct encoding, e.g. utf-16 or open the file in binary mode ( rb or wb ).
Python bytes() Function The bytes() function returns a bytes object. It can convert objects into bytes objects, or create empty bytes object of the specified size.
The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8 . Here is an example of how the error occurs.
You have a number of problems:
i = "b'0xff'"
Creates a string of 7 bytes, not a single 0xFF byte. i = b'\xff'
or i = bytes([0xff])
is the correct method.
open
defaults to decoding files using the encoding returned by local.getpreferredencoding(False)
. Open in binary mode to get raw un-decoded bytes: open('firstfile','rb')
.
g=f.readlines()
returns a list of lines. i in g
checks for an exact match of the content of i to the content of a line in the line list.
Instead:
byte = b'\xff'
with open('firstfile','rb') as f:
file_content = f.read()
if byte in file_content:
...
To decode a file, you need to know it's correct encoding and provide it when you open the file:
with open('firstfile',encoding='utf8') as f:
file_content = f.read()
If you don't know the encoding, the 3rd party chardet
module can help you guess.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With