Python 3 UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d

Question

I want to make search engine and I follow tutorial in some web. I want to test parse html

from bs4 import BeautifulSoup  def parse_html(filename):     """Extract the Author, Title and Text from a HTML file     which was produced by pdftotext with the option -htmlmeta."""     with open(filename) as infile:         html = BeautifulSoup(infile, "html.parser", from_encoding='utf-8')         d = {'text': html.pre.text}         if html.title is not None:             d['title'] = html.title.text         for meta in html.findAll('meta'):             try:                 if meta['name'] in ('Author', 'Title'):                     d[meta['name'].lower()] = meta['content']             except KeyError:                 continue         return d  parse_html("C:\pdf\pydf\data\muellner2011.html")

and it getting error

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 867: character maps to <undefined>enter code here

I saw some solutions on the Web using the encode(). But I don't know how to insert encode() function in code. Can anyone help me?

Martijn Pieters · Accepted Answer

In Python 3, files are opened as text (decoded to Unicode) for you; you don't need to tell BeautifulSoup what codec to decode from.

If decoding of the data fails, that's because you didn't tell the open() call what codec to use when reading the file; add the correct codec with an encoding argument:

with open(filename, encoding='utf8') as infile:     html = BeautifulSoup(infile, "html.parser")

otherwise the file will be opened with your system default codec, which is OS dependent.

Python 3 UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d

Tags:

python

unicode

Fakhriyanto

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

Python 3 UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d

Tags:

python

unicode

Fakhriyanto

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us