Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scraping website with login page

I currently login to the time from website using the following script.

browser = webdriver.Chrome('E:/Shared Folders/Users/runnerjp/chromedriver/chromedriver.exe')
browser.get("https://www.timeform.com/horse-racing/account/sign-in?returnUrl=%2Fhorse-racing%2F") 
time.sleep(3)
username = browser.find_element_by_id("EmailAddress")
password = browser.find_element_by_id("Password")
username.send_keys("usr")
password.send_keys("pass")
login_attempt = browser.find_element_by_xpath("//input[@type='submit']")
time.sleep(3)
login_attempt.submit()

it works but I find using Chrome web driver is hammering my CPU. Is there an alternative code I could use that does not mean I need to physically load the page to sign in?

like image 980
emma perkins Avatar asked Jan 03 '23 22:01

emma perkins


1 Answers

All of the answers here have some merit, but it depends on the type of website being scraped and how it authenticates the logon.
If the webpage generates some or all of its content through javascript/ajax requests etc, then using selenium is the only way to go, as this allows the execution of javascript. However to keep cpu usage to a minimum you can use a "headless" browser such as phantomjs. phantomjs uses the same html engine and javascript engine as chrome, so you could test your code with chrome, and switch at the end.

If the content of the page is "static" then you can use the requests module. However the method of doing this will depend on whether the webpage uses the "basic" authentication baked into the http protocol (most things don't) in which case:

import requests
requests.get('https://api.github.com/user', auth=('user', 'pass'))

as suggested by CodeMonkey

but if it uses something else you'll have to analyse the login form to see what address the post request is sent to, and build a request using that address, and putting the username/password into fields with the ID of the elements on the form.

like image 167
James Kent Avatar answered Jan 16 '23 20:01

James Kent