I run in an encoding Problem, when a response is put in beautifulsoup.
The readible-output of the response is formated in a proper way like Artikelstandort: Österreich, but after running beautifulsoup it will be transformed to Artikelstandort: Österreich. I'll provide you the changed code:
def formTest (browser, formUrl, cardName, edition):
browser.open (formUrl)
data = browser.response().read()
with open ('analyze.txt', 'wb') as textFile:
print 'wrinting file'
textFile.write (data)
#BS4 -> need from_encoding
soup = BeautifulSoup (data, from_encoding = 'latin-1')
soup = soup.encode ('latin-1').decode('utf-8')
table = soup.find('table', { "class" : "MKMTable specimenTable"})
data has the correct data, but the soup has the wrong encoding. I tried various encoding/decoding on the soup, but got no working result.
The page where I pull my data from is: https://www.magickartenmarkt.de/Mutilate_Magic_2013.c1p256992.prod
Edit: I changed the encoding with prettify like suggested, but now i'm facing following error:
TypeError: slice indices must be integers or None or have an __index__ method
What was changed with prettify? I plotted the new output and the table is still in the "soup" (<table class="MKMTable specimenTable">)
Edit2:
New error is:
at: soup.encode ('latin-1').decode('utf-8')
Error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 518: invalid start byte
If I play with the encodings and decodings, errors with decoding some other byte will occur.
You probably don't need the solution by now, but if anyone stops by here is what you should do:
You should probably use encoding proceedures on data and not on soup.
What I usally do is to use requests library to get raw response then take the text content by using a syntax like'response.text' then enforce the encoding with response.encoding='utf-8'.
At the very least, i feed the response.text to BeautifulSoup()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With