Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Output of beautifulsoup has wrong encoding

I run in an encoding Problem, when a response is put in beautifulsoup. The readible-output of the response is formated in a proper way like Artikelstandort: Österreich, but after running beautifulsoup it will be transformed to Artikelstandort: Österreich. I'll provide you the changed code:

def formTest (browser, formUrl, cardName, edition):
   browser.open (formUrl)

   data = browser.response().read()
   with open ('analyze.txt', 'wb') as textFile:
      print 'wrinting file'
      textFile.write (data)

   #BS4 -> need from_encoding
   soup = BeautifulSoup (data, from_encoding = 'latin-1')
   soup = soup.encode ('latin-1').decode('utf-8')
   table = soup.find('table', { "class" : "MKMTable specimenTable"})

data has the correct data, but the soup has the wrong encoding. I tried various encoding/decoding on the soup, but got no working result.

The page where I pull my data from is: https://www.magickartenmarkt.de/Mutilate_Magic_2013.c1p256992.prod

Edit: I changed the encoding with prettify like suggested, but now i'm facing following error:

TypeError: slice indices must be integers or None or have an __index__ method

What was changed with prettify? I plotted the new output and the table is still in the "soup" (<table class="MKMTable specimenTable">)

Edit2:

New error is:

at: soup.encode ('latin-1').decode('utf-8')

Error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 518: invalid start byte

If I play with the encodings and decodings, errors with decoding some other byte will occur.

like image 751
Rappel Avatar asked Apr 27 '26 00:04

Rappel


1 Answers

You probably don't need the solution by now, but if anyone stops by here is what you should do:
You should probably use encoding proceedures on data and not on soup.
What I usally do is to use requests library to get raw response then take the text content by using a syntax like'response.text' then enforce the encoding with response.encoding='utf-8'.
At the very least, i feed the response.text to BeautifulSoup()

like image 65
Kaan E. Avatar answered Apr 29 '26 12:04

Kaan E.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!