I had this web scraping code working a few minutes ago, but now I get this warning and encoding. Since this request doesn't return html, Beautifulsoup is returning a None type when I search for the contents of a tag. What is going wrong here? I tried to google a bit for this encoding problem, but couldn't find a clear answer.
import requests
from bs4 import BeautifulSoup
url = 'http://finance.yahoo.com/q?s=aapl&fr=uh3_finance_web&uhb=uhb2'
data = requests.get(url)
soup = BeautifulSoup(data.content).text
print(data)
Here are the results:
0.0 seconds
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<Response [200]>
{}
Process finished with exit code 0
The constructor of Beautifulsoup below worked for me:
soup = BeautifulSoup(open(html_path, 'r'),"html.parser",from_encoding="iso-8859-1")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With