Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape data from a page that requires a login

I am new to Python and Web Scraping and I am trying to write a very basic script that will get data from a webpage that can only be accessed after logging in. I have looked at a bunch of different examples but none are fixing the issue. This is what I have so far:

from bs4 import BeautifulSoup
import urllib, urllib2, cookielib

username = 'name'
password = 'pass'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('WebpageWithLoginForm')
resp = opener.open('WebpageIWantToAccess')
soup = BeautifulSoup(resp, 'html.parser')
print soup.prettify()

As of right now when I print the page it just prints the contents of the page as if I was not logged in. I think the issue has something to do with the way I am setting the cookies but I am really not sure because I do not fully understand what is happening with the cookie processor and its libraries. Thank you!

Current Code:

import requests
import sys

EMAIL = 'usr'
PASSWORD = 'pass'

URL = 'https://connect.lehigh.edu/app/login'

def main():
    # Start a session so we can have persistant cookies
    session = requests.session(config={'verbose': sys.stderr})
    # This is the form data that the page sends when logging in
    login_data = {
        'username': EMAIL,
        'password': PASSWORD,
        'LOGIN': 'login',
    }

    # Authenticate
    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    r = session.get('https://lewisweb.cc.lehigh.edu/PROD/bwskfshd.P_CrseSchdDetl')

if __name__ == '__main__':
    main()
like image 938
Aaron Rotem Avatar asked Nov 08 '22 11:11

Aaron Rotem


1 Answers

You can use the requests module.

Take a look at this answer that i've linked below.

https://stackoverflow.com/a/8316989/6464893

like image 142
Harrison Avatar answered Nov 14 '22 23:11

Harrison