Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 55: character maps to <undefined>

I am new to Python and am hoping that someone could please explain to me what the error message means.

To be specific, I have some code of Python and SPSS combined together saved in Atom, which was created by a former colleague. Now since the former colleague is not here anymore, I need to run the code now. What I did was I ran the code below from SPSS22.

    begin program.
    import spss,spssaux,imp
    abcvalid = imp.load_source('abcvalid', "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py") 
    import abcvalid
    abcvalid.fullprocess("9_26_2016","M:/Users/Yli\2016 SURVEY/DOWNLOADS/9_26_2016/","M:/Users/Yli/2016 SURVEY/Legacy15.sav")
    end program.

Then I got the following from the output.

    Traceback (most recent call last):
      File "<string>", line 5, in <module>
      File "I:/VALIDITY CHECK/Python Library/2016/abcnvalid2016.py", line 2067, in fullprocess
        dataprep(date,filepath,legacypath)
      File "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py", line 2006, in dataprep
        emailslower(date,filepath)
      File "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py", line 1635, in emailslower
        DATASET ACTIVATE comment_data.""".format(date,filepath))
      File "C:\PROGRA~1\IBM\SPSS\STATIS~1\22\Python\Lib\site-packages\spss\spss.py", line 1494, in Submit
        cmdList = spssutil.CheckStr(cmdList)
      File "C:\PROGRA~1\IBM\SPSS\STATIS~1\22\Python\Lib\site-packages\spss\spssutil.py", line 166, in CheckStr
        s1 = unicode(mystr,locale.getlocale(locale.LC_CTYPE)[1])
      File "C:\Program Files\IBM\SPSS\Statistics\22\Python\lib\encodings\cp1252.py", line 15, in decode
        return codecs.charmap_decode(input,errors,decoding_table)
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 55: character maps to <undefined>

I know there are similar questions on this site, but the questions and answers were too hard for me to comprehend. If someone could please help me, I'd really appreciate it!

Thank you in advance!

like image 451
user6655908 Avatar asked Dec 07 '22 21:12

user6655908


2 Answers

On a similar problem with same error message i did something like this and it worked good for me.

with open(workfile, 'r', encoding='utf-8') as f:
    read_data = f.read()
f.close()
like image 118
andsa Avatar answered Jan 17 '23 19:01

andsa


First, here is a minimal example reproducing your error on Windows:

import subprocess

with subprocess.Popen("cmd /c echo ü", stdout=subprocess.PIPE, text=True) as Process:
    for Line in Process.stdout:
        print(Line)

To my understanding, the problem is this (I put together some information and examples which I have found, but am not certain everything is correct. I welcome corrections.)

  • The ü character is code point 252 = 0xfc in Unicode, https://unicode-table.com/en/00FC/).
  • Python correct passes the ü character to the console, as you can test using this example (be sure to save the file as UTF-8):
import subprocess

print(ord('ü'))
subprocess.call("cmd /c echo ü")

I am not sure why this is working in the first place. (This answer may be why: https://stackoverflow.com/a/32176732/880783)

  • The console uses something else than Unicode internally. For example, in the ASCII table, the ü character is at position 129 = 0x81 (sounds familiar?).
  • So when the console returns that character, Python thinks its a Unicode codepoint, but 0x81 is not defined. Hence the error.

The key is to make Python understand that how what it gets from the process is encoded. In my example (Windows console), I have tried a couple of encodings (see the list here) like this:

import subprocess

Encoding = 'cp850'
with subprocess.Popen("cmd /c echo ü", stdout=subprocess.PIPE, text=True, encoding=Encoding) as Process:
    for Line in Process.stdout:
        print(Line)
  • 'ascii' fails with an ordinal not in range(128) error (probably does not cover extended ASCII).
  • 'cp1252' fails with character maps to <undefined>
  • 'latin_1' works, but outputs a box character (``) on my debug console in VS Code.
  • 'cp850' seem to works, outputting a ü character.

So I will stick with 'cp850' for now and see how it goes.

like image 38
bers Avatar answered Jan 17 '23 20:01

bers