Why Read_HTML from Python Pandas not working?

Question

I would like to use Python Pandas Read_HTML() function to scrape the information from Yahoo Finance table, seen in the screenshot, bordered in red.

enter image description here

However, I received a HTTPError: HTTP Error 404: Not Found

Here is my code output:

!pip install pandas
!pip install requests
!pip install bs4
!pip install requests_html
!pip install pytest-astropy
!pip install nest_asyncio
!pip install plotly

import pandas as pd
from bs4 import BeautifulSoup
import requests
import requests_html
import nest_asyncio
import lxml
import html5lib
nest_asyncio.apply()

url_link = "https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27"
read_html_pandas_data = pd.read_html(url_link)

Md. Fazlul Hoque · Accepted Answer

Try as follows:

import pandas as pd
import requests
url_link = 'https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27'
r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
read_html_pandas_data = pd.read_html(r.text)[0]
print(read_html_pandas_data)

QHarr · Answer

Because an user-agent header is needed which can't be specified with read_html. You could grab table first with requests, specifying the appropriate header, then handover to pandas:

from pandas import read_html as rh
import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
table = rh(str(soup.select_one('[data-test="historical-prices"]')))[0]
print(table)

Why Read_HTML from Python Pandas not working?

Tags:

python

html

pandas

web-scraping

TropicalMagic

2 Answers

Md. Fazlul Hoque

QHarr

Recent Activity

Donate For Us

Why Read_HTML from Python Pandas not working?

Tags:

python

html

pandas

web-scraping

TropicalMagic

2 Answers

Md. Fazlul Hoque

QHarr

Related questions

Recent Activity

Donate For Us