I am trying to scrape a page on a website that requires a login and am consitently getting a 403 Error.
I have modified the code from these 2 posts for my site, Using rvest or httr to log in to non-standard forms on a webpage and how to reuse a session to avoid repeated login when scraping with rvest?
library(rvest)
pgsession <- html_session("https://www.optionslam.com/earnings/stocks/MSFT?page=-1")
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform, 'username'='user', 'password'='pass')
s <- submit_form(pgsession, filled_form) # s is your logged in session
When the code is run, I get this message:
Submitting with 'NULL'
Warning message:
In request_POST(session, url = url, body = request$values, encode = request$encode, :
Forbidden (HTTP 403).
I have also run the code this way, by updating user_agent as R.S. suggested in the comments, however, I receive the same error as above.
library(rvest)
library(httr)
uastring <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"
pgsession <- html_session("https://www.optionslam.com/earnings/stocks/MSFT?page=-1", user_agent(uastring))
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform, 'username'='user', 'password'='pass')
s <- submit_form(pgsession, filled_form) # s is your logged in session
If you pull the page up without logging in, it shows you a bit of the data table at the bottom right below the text: "Earnings Events Available: 65"
Once logged in, it will show all 65 events and the table will be filled in which is what I want to download. I have all the code necessary to do that in place but am stuck just on the login part.
Thank you for your help.
The 403 Forbidden Error happens when the web page (or another resource) that you're trying to open in your web browser is a resource that you're not allowed to access.
The 403 Forbidden error occurs when a request is made the server cannot allow. This is often due to a firewall ruleset that strictly prohibits this specific request, but other settings such as permissions may prevent access based on user rights.
You can't always fix a 403 error on your own, but simple tricks like refreshing your page or clearing your cache could help. If visitors to your webpage are getting 403 errors, you may have to reconfigure it.
Using R.S.'s suggestion, I used RSelenium to log in successfully.
A quick note for fellow mac users on using either chrome or phantom. I am running El Capitan so had some issue getting the mac to recognize the paths to both of the bin files. Instead, I moved the bin files to /usr/local/bin and they ran without an issue.
Below is the code to do so:
library(RSelenium)
RSelenium::startServer()
remDr <- remoteDriver(browserName = "chrome")
remDr$open()
appURL <- 'https://www.optionslam.com/accounts/login/'
remDr$navigate(appURL)
remDr$findElement("id", "id_username")$sendKeysToElement(list("user"))
remDr$findElement("id", "id_password")$sendKeysToElement(list("password", key='enter'))
appURL <- 'https://www.optionslam.com/earnings/stocks/MSFT?page=-1'
remDr$navigate(appURL)
This can also be done with phantom,
library(RSelenium)
pJS <- phantom() # start phantomjs
appURL <- 'https://www.optionslam.com/accounts/login/'
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open()
remDr$navigate(appURL)
remDr$findElement("id", "id_username")$sendKeysToElement(list("user"))
remDr$findElement("id", "id_password")$sendKeysToElement(list("password", key='enter'))
appURL <- 'https://www.optionslam.com/earnings/stocks/MSFT?page=-1'
remDr$navigate(appURL)
Here's the answer to solve the problem in the original use case with rvest
:
library(rvest)
library(httr)
uastring <- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"
pgsession <- html_session("https://www.optionslam.com/earnings/stocks/MSFT?page=-1", user_agent(uastring))
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform,
username = 'un',
password = 'ps')
s <- submit_form(pgsession, filled_form, submit = NULL, config(referer = pgsession$url)) # s is your logged in session
The requested requires knowledge of the page you've come from (the referer
(sic)).
config(referer = pgsession$url)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With