I generated a bugreport in Android through ADB and extracted the large report file. But when I open and read that file, it prints:
>>> f = open('bugreport.txt')
>>> f.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 12788794: invalid start byte
>>> f = open('bugreport.txt', encoding='ascii')
>>> f.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 5455694: ordinal not in range(128)
It seems that neither UTF-8 nor ASCII codec can decode the file.
Then I checked the file encoding by two commands:
$ enca bugreport.txt
7bit ASCII characters
$ file -i bugreport.txt
bugreport.txt: text/plain; charset=us-ascii
They show me the file is encoded in ascii, while I can't open it by ascii codec.
Some other clues:
1. The above python interpreter is python 3.6.3. I tried python 2.7.14 and it went well.
2. If the file is opened by adding parameters errors='ignore' and encoding='ascii', it can be read but all Chinese characters are lost.
So how can I open that peculiar file in python 3? Can anyone help me?
In python 3 you can specify encoding with open context.
with open(file, encoding='utf-8') as f:
data = f.read()
It's likely that the file is encoded as latin-1 or utf-16 (little-endian).
>>> bytes_ = [b'\xc0', b'\xef']
>>> for b in bytes_:
... print(repr(b), b.decode('latin-1'))
...
b'\xc0'
b'\xef'
>>> bytes_ = [b'\xc0\x00', b'\xef\x00']
>>> for b in bytes_:
... print(repr(b), b.decode('utf-16le'))
...
b'\xc0\x00'
b'\xef\x00'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With