Python: Output of beautifulsoup has wrong encoding

Question

I run in an encoding Problem, when a response is put in beautifulsoup. The readible-output of the response is formated in a proper way like Artikelstandort: Österreich, but after running beautifulsoup it will be transformed to Artikelstandort: Ã–sterreich. I'll provide you the changed code:

def formTest (browser, formUrl, cardName, edition):
   browser.open (formUrl)

   data = browser.response().read()
   with open ('analyze.txt', 'wb') as textFile:
      print 'wrinting file'
      textFile.write (data)

   #BS4 -> need from_encoding
   soup = BeautifulSoup (data, from_encoding = 'latin-1')
   soup = soup.encode ('latin-1').decode('utf-8')
   table = soup.find('table', { "class" : "MKMTable specimenTable"})

data has the correct data, but the soup has the wrong encoding. I tried various encoding/decoding on the soup, but got no working result.

The page where I pull my data from is: https://www.magickartenmarkt.de/Mutilate_Magic_2013.c1p256992.prod

Edit: I changed the encoding with prettify like suggested, but now i'm facing following error:

TypeError: slice indices must be integers or None or have an __index__ method

What was changed with prettify? I plotted the new output and the table is still in the "soup" (<table class="MKMTable specimenTable">)

Edit2:

New error is:

at: soup.encode ('latin-1').decode('utf-8')

Error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 518: invalid start byte

If I play with the encodings and decodings, errors with decoding some other byte will occur.

Kaan E. · Accepted Answer

You probably don't need the solution by now, but if anyone stops by here is what you should do:
You should probably use encoding proceedures on data and not on soup.
What I usally do is to use requests library to get raw response then take the text content by using a syntax like'response.text' then enforce the encoding with response.encoding='utf-8'.
At the very least, i feed the response.text to BeautifulSoup()

Python: Output of beautifulsoup has wrong encoding

Tags:

python

encoding

beautifulsoup

decoding

mechanize

Rappel

1 Answers

Kaan E.

Recent Activity

Donate For Us

Python: Output of beautifulsoup has wrong encoding

Tags:

python

encoding

beautifulsoup

decoding

mechanize

Rappel

1 Answers

Kaan E.

Related questions

Recent Activity

Donate For Us