Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError: 'charmap' codec can't encode characters

I'm trying to scrape a website, but it gives me an error.

I'm using the following code:

import urllib.request from bs4 import BeautifulSoup  get = urllib.request.urlopen("https://www.website.com/") html = get.read()  soup = BeautifulSoup(html)  print(soup) 

And I'm getting the following error:

File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode     return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 70924-70950: character maps to <undefined> 

What can I do to fix this?

like image 717
SstrykerR Avatar asked Nov 23 '14 18:11

SstrykerR


People also ask

How do I fix UnicodeEncodeError?

To fix UnicodeEncodeError: 'charmap' codec can't encode characters with Python, we can set the encodings argument when we open the file. to call open with the fname file name path and the encoding argument set to utf-8 to open the file at fname as a Unicode encoded file.

What is Charmap error in Python?

The Python "UnicodeEncodeError: 'charmap' codec can't encode characters in position" occurs when we use an incorrect codec to encode a string to bytes. To solve the error, specify the correct encoding when opening the file or encoding the string, e.g. utf-8 .

How do you stop Unicode errors in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.


2 Answers

I was getting the same UnicodeEncodeError when saving scraped web content to a file. To fix it I replaced this code:

with open(fname, "w") as f:     f.write(html) 

with this:

with open(fname, "w", encoding="utf-8") as f:     f.write(html) 

If you need to support Python 2, then use this:

import io with io.open(fname, "w", encoding="utf-8") as f:     f.write(html) 

If your file is encoded in something other than UTF-8, specify whatever your actual encoding is for encoding.

like image 149
twasbrillig Avatar answered Oct 06 '22 01:10

twasbrillig


I fixed it by adding .encode("utf-8") to soup.

That means that print(soup) becomes print(soup.encode("utf-8")).

like image 37
SstrykerR Avatar answered Oct 06 '22 00:10

SstrykerR