Heres what I did..
>>> soup = BeautifulSoup (html)
>>> soup
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 96953: ordinal not in range(128)
>>>
>>> soup.find('div')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 11035: ordinal not in range(128)
>>>
>>> soup.find('span')
<span id="navLogoPrimary" class="navSprite"><span>amazon.com</span></span>
>>>
How can I simply remove troubling unicode characters from html
?
Or is there any cleaner solution ?
Try this way:
soup = BeautifulSoup (html.decode('utf-8', 'ignore'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With