Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Can't read file encoded in ASCII

I generated a bugreport in Android through ADB and extracted the large report file. But when I open and read that file, it prints:

>>> f = open('bugreport.txt')
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 12788794: invalid start byte

>>> f = open('bugreport.txt', encoding='ascii')
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 5455694: ordinal not in range(128)

It seems that neither UTF-8 nor ASCII codec can decode the file.
Then I checked the file encoding by two commands:

$ enca bugreport.txt
7bit ASCII characters
$ file -i bugreport.txt
bugreport.txt: text/plain; charset=us-ascii

They show me the file is encoded in ascii, while I can't open it by ascii codec.
Some other clues:
1. The above python interpreter is python 3.6.3. I tried python 2.7.14 and it went well.
2. If the file is opened by adding parameters errors='ignore' and encoding='ascii', it can be read but all Chinese characters are lost.

So how can I open that peculiar file in python 3? Can anyone help me?

like image 291
Michael Avatar asked Jan 24 '26 03:01

Michael


2 Answers

In python 3 you can specify encoding with open context.

with open(file, encoding='utf-8') as f:
    data = f.read()
like image 127
Rahul Avatar answered Jan 25 '26 17:01

Rahul


It's likely that the file is encoded as latin-1 or utf-16 (little-endian).

>>> bytes_ = [b'\xc0', b'\xef']
>>> for b in bytes_:
...     print(repr(b), b.decode('latin-1'))
... 
b'\xc0' 
b'\xef' 
>>> bytes_ = [b'\xc0\x00', b'\xef\x00']
>>> for b in bytes_:
...     print(repr(b), b.decode('utf-16le'))
... 
b'\xc0\x00' 
b'\xef\x00' 
like image 33
snakecharmerb Avatar answered Jan 25 '26 16:01

snakecharmerb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!