I would like to use Python Pandas Read_HTML() function to scrape the information from Yahoo Finance table, seen in the screenshot, bordered in red.

However, I received a HTTPError: HTTP Error 404: Not Found
Here is my code output:
!pip install pandas
!pip install requests
!pip install bs4
!pip install requests_html
!pip install pytest-astropy
!pip install nest_asyncio
!pip install plotly
import pandas as pd
from bs4 import BeautifulSoup
import requests
import requests_html
import nest_asyncio
import lxml
import html5lib
nest_asyncio.apply()
url_link = "https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27"
read_html_pandas_data = pd.read_html(url_link)
Try as follows:
import pandas as pd
import requests
url_link = 'https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27'
r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
read_html_pandas_data = pd.read_html(r.text)[0]
print(read_html_pandas_data)
Because an user-agent header is needed which can't be specified with read_html. You could grab table first with requests, specifying the appropriate header, then handover to pandas:
from pandas import read_html as rh
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
table = rh(str(soup.select_one('[data-test="historical-prices"]')))[0]
print(table)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With