Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - UnicodeEncodeError: 'charmap' codec can't encode characters in position 85-89: character maps to <undefined>

I am trying to see if I can transfer the output of urllib.request.urlopen() to a text file just to look at it. I tried decoding the output into a string so I can write into a file, but apparently the original output included some Korean characters that are not translating properly into the string.

So far I have:

from urllib.request import urlopen

openU = urlopen(myUrl)
pageH = openU.read()
openU.close()
stringU = pageH.decode("utf-8")

f=open("test.txt", "w+")
f.write(stringU)

I do not get any errors until the last step at which point it says:

Traceback (most recent call last):  
  File "<stdin>", line 1, in <module>  
  File "C:\Users\Chae\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode  
  return codecs.charmap_encode(input,self.errors,encoding_table)[0] 
UnicodeEncodeError: 'charmap' codec can't encode characters in position 85-89: character maps to `<undefined>`

Is there a way to get the string to also include Korean or if not, how do I skip the characters causing problems and write the rest of the string into the file?

like image 545
Chae Avatar asked Apr 06 '18 01:04

Chae


People also ask

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

What is Charmap codec in Python?

The Python "UnicodeEncodeError: 'charmap' codec can't encode characters in position" occurs when we use an incorrect codec to encode a string to bytes. To solve the error, specify the correct encoding when opening the file or encoding the string, e.g. utf-8 . Here is an example of how the error occurs.

Does Python use UTF-8?

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding.


1 Answers

Does it matter to you what the file encoding is? If not, then use utf-8 encoding:

f=open("test.txt", "w+", encoding="utf-8")
f.write(stringU)

If you want the file to be cp1252-encoded, which apparently is the default on your system, and to ignore unencodable values, add errors="ignore":

f=open("test.txt", "w+", errors="ignore")
f.write(stringU)
like image 156
Robᵩ Avatar answered Oct 05 '22 11:10

Robᵩ