Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTTP Error 403: Forbidden when reading HTML

Tags:

python

pandas

I would like to read the following html,

 import pandas as pd

daily_info=pd.read_html('https://www.investing.com/earnings-calendar/',flavor='html5lib')

print(daily_info)

Unfortunatelly appears :

urllib.error.HTTPError: HTTP Error 403: Forbidden

Is there anyway to fix it?

like image 262
JamesHudson81 Avatar asked Apr 24 '17 14:04

JamesHudson81


People also ask

How do I fix 403 authorization error?

You can try to fix error 403 in Google Chrome by refreshing the page, double-checking the address, clearing the cache and cookies from your browser, confirming that you have the authorization to view the page, contacting the website directly, or returning to the page later.

Why does a website keep saying 403 forbidden?

The 403 Forbidden error means that your server is working, but you no longer have permission to view all or some of your site for some reason. The two most likely causes of this error are issues with your WordPress site's file permissions or . htaccess file.

How do I get rid of 403 forbidden on Chrome?

Refreshing the page is always worth a shot. Many times the 403 error is temporary, and a simple refresh might do the trick. Most browsers use Ctrl+R on Windows or Cmd+R on Mac to refresh, and also provide a Refresh button somewhere on the address bar.


1 Answers

Pretend to be a browser:

import requests

url = 'https://www.investing.com/earnings-calendar/'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

r = requests.get(url, headers=header)

dfs = pd.read_html(r.text)

Result:

In [201]: len(dfs)
Out[201]: 7

In [202]: dfs[0]
Out[202]:
    0   1   2   3
0 NaN NaN NaN NaN

In [203]: dfs[1]
Out[203]:
                 Unnamed: 0                                      Company    EPS /  Forecast Revenue /  Forecast.1 Market Cap  Time  \
0    Monday, April 24, 2017                                          NaN    NaN         NaN     NaN           NaN        NaN   NaN
1                       NaN                                 Acadia (AKR)     --      / 0.11      --          / --      2.63B   NaN
2                       NaN                                  Agree (ADC)     --      / 0.39      --          / --      1.34B   NaN
3                       NaN                                   Alcoa (AA)     --      / 0.53      --          / --      5.84B   NaN
4                       NaN                        American Campus (ACC)     --      / 0.27      --          / --      6.62B   NaN
5                       NaN                   Ameriprise Financial (AMP)     --      / 2.52      --          / --     19.76B   NaN
6                       NaN                          Avacta Group (AVTG)     --        / --   1.26M          / --     47.53M   NaN
7                       NaN                         Bank of Hawaii (BOH)    1.2      / 1.08  165.8M          / --      3.48B   NaN
8                       NaN                         Bank of Marin (BMRC)   0.74       / 0.8      --          / --    422.29M   NaN
9                       NaN                                Banner (BANR)     --      / 0.68      --          / --      1.82B   NaN
10                      NaN                           Barrick Gold (ABX)     --       / 0.2      --          / --     22.44B   NaN
11                      NaN                           Barrick Gold (ABX)     --      / 0.28      --          / --     30.28B   NaN
12                      NaN               Berkshire Hills Bancorp (BHLB)     --      / 0.54      --          / --      1.25B   NaN
13                      NaN   Brookfield Canada Office Properties (BOXC)     --        / --      --          / --        NaN   NaN

...
like image 125
MaxU - stop WAR against UA Avatar answered Oct 12 '22 11:10

MaxU - stop WAR against UA