Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

While reading file on Python, I got a UnicodeDecodeError. What can I do to resolve this?

This is one of my own projects. This will later help benefit other people in a game I am playing (AssaultCube). Its purpose is to break down the log file and make it easier for users to read.

I kept getting this issue. Anyone know how to fix this? Currently, I am not planning to write/create the file. I just want this error to be fixed.

The line that triggered the error is a blank line (it stopped on line 66346).

This is what the relevant part of my script looks like:

log  =  open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r') for line in log: 

and the exception is:

Traceback (most recent call last):   File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 159, in <module>     main()  File "C:\Users\Owner\Desktop\Exodus Logs\Log File Translater.py", line 7, in main     for line in log:   File "C:\Python32\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3074: character maps to <undefined> 
like image 795
Bugboy1028 Avatar asked May 13 '13 18:05

Bugboy1028


People also ask

What does UnicodeDecodeError mean in Python?

The Python "UnicodeDecodeError: 'ascii' codec can't decode byte in position" occurs when we use the ascii codec to decode bytes that were encoded using a different codec. To solve the error, specify the correct encoding, e.g. utf-8 .

What is a Unicode error in Python?

When we use such a string as a parameter to any function, there is a possibility of the occurrence of an error. Such error is known as Unicode error in Python. We get such an error because any character after the Unicode escape sequence (“ \u ”) produces an error which is a typical error on windows.

What is Surrogateescape?

[surrogateescape] handles decoding errors by squirreling the data away in a little used part of the Unicode code point space. When encoding, it translates those hidden away values back into the exact original byte sequence that failed to decode correctly.


1 Answers

Try:

enc = 'utf-8' log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc) 

if it won't work try:

enc = 'utf-16' log = open('/Users/Owner/Desktop/Exodus Logs/DIRTYLOGS/serverlog_20130430_00.15.21.txt', 'r', encoding=enc) 

you could also try it with

enc = 'iso-8859-15' 

also try:

enc = 'cp437' 

wich is very old but it also has the "ü" at 0x81 wich would fit to the string "üßer" wich I found on the homepage of assault cube.

If all the codings are wrong try to contact some of the guys developing assault cube or as mentioned in a comment: have a look at https://pypi.python.org/pypi/chardet

like image 129
Rouven B. Avatar answered Sep 22 '22 19:09

Rouven B.