Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logging into a forum using Python Requests

I am trying to log into a forum using python requests. This is the forum I'm trying to log into: http://fans.heat.nba.com/community/

Here's my code:

import requests
import sys

URL = "http://fans.heat.nba.com/community/index.php?app=core&module=global&section=login"

def main():
    session = requests.Session()

    # This is the form data that the page sends when logging in
    login_data = {
        'ips_username': 'username',
        'ips_password': 'password',
        'signin_options': 'submit',
        'redirect':'index.php?'
    }

    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    q = session.get('http://fans.heat.nba.com/community/index.php?app=members&module=messaging&section=view&do=showConversation&topicID=4314&st=20#msg26627')
    print(session.cookies)
    print(r.status_code)
    print(q.status_code)

if __name__ == '__main__':
    main()

The URL is the login page on the forums. With the 'q' variable, the session tries to access a certain webpage on the forums (private messenger) that can only be accessed if you're logged in. However, the status code for that request returns '403', which means that I was unable to log in successfully.

Why am I unable to log in? In the 'login_data', 'ips_username' and 'ips_password' are the HTML forms. However, I believe I have the actual log-in commands ('signin_options','redirect') wrong.

Can somebody guide me to the correct log-in commands please?

like image 956
Carnageta Avatar asked Dec 02 '25 05:12

Carnageta


2 Answers

There are a hidden input in the form auth_key

<input type='hidden' name='auth_key' value='880ea6a14ea49e853634fbdc5015a024' />

So you need to parse it and pass it to the login page. You could simply use regex

def main():
      session = requests.Session()

      # Get the source page that contain the auth_key
      r = requests.get("http://fans.heat.nba.com/community/index.php?app=core&module=global&section=login")
      # Parse it
      auth_key = re.findall("auth_key' value='(.*?)'",r.text)[0]


      # This is the form data that the page sends when logging in
      login_data = {
           'ips_username': 'username',
           'ips_password': 'password',
           'auth_key' : auth_key                                                                                                                      

      }

And the rest should be the same.

like image 138
Chaker Avatar answered Dec 03 '25 18:12

Chaker


As indicated by @Chaker in the comments, the login form requires you to send an auth_key that you need to read from an initial visit to a page first.

The auth_key is a hidden form field with a random value (generated and stored by the server), so every regular web browser sends that with the POST request. The server then validates the request and requires it to contain an auth_key that it knows is valid (by checking against its list of issued auth_keys). So the process needs to be as follows:

  • Visit the front page (or any page below that probably)
  • Read the value of the auth_key hidden field
  • Create a POST request that includes your credentials and that auth_key

So this seems to work:

import re
import requests

USERNAME = 'username'
PASSWORD = 'password'

AUTH_KEY = re.compile(r"<input type='hidden' name='auth_key' value='(.*?)' \/>")

BASE_URL = 'http://fans.heat.nba.com/community/'
LOGIN_URL = BASE_URL + '/index.php?app=core&module=global&section=login&do=process'
SETTINGS_URL = BASE_URL + 'index.php?app=core&module=usercp'

payload = {
    'ips_username': USERNAME,
    'ips_password': PASSWORD,
    'rememberMe': '1',
    'referer': 'http://fans.heat.nba.com/community/',
}

with requests.session() as session:
    response = session.get(BASE_URL)
    auth_key = AUTH_KEY.search(response.text).group(1)
    payload['auth_key'] = auth_key
    print("auth_key: %s" % auth_key)

    response = session.post(LOGIN_URL, data=payload)
    print("Login Response: %s" % response)

    response = session.get(SETTINGS_URL)
    print("Settings Page Response: %s" % response)

assert "General Account Settings" in response.text

Output:

auth_key: 777777774ea49e853634fbdc77777777
Login Response: <Response [200]>
Settings Page Response: <Response [200]>

AUTH_KEY is a regular expression that matches any pattern that looks like <input type='hidden' name='auth_key' value='?????' \/> where ????? is a group of zero or more characters (non-greedy, which means it looks for the shortest match). The documentation on the re module should get you started with regular expressions. You can also test that regular expression here, have it explained and toy around with it.

Note: If you were to actually parse (X)HTML, you should always use an (X)HTML parser. However, for this quick and dirty way to extract the hidden form field, a non-greedy regex does the job just fine.

like image 39
Lukas Graf Avatar answered Dec 03 '25 17:12

Lukas Graf