Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: requests can't login to a website

I need to scrape website that requires login. I'm trying to create a session and login as I have to scrape different pages after logging in. But can't find out why it's not working.

import requests
from bs4 import BeautifulSoup

login_data = {
           "log":"login",
           "login":"my email",
           "password":"my password"
}

session = requests.session()
session.post(login_url, data=login_data)
response = session.get(url)
html = response.text
soup = BeautifulSoup(html, "html.parser")
print(soup.title.get_text())

Title shows it's not working.

Here is the website form.

<form method="post" id="signin-form" class="form-horizontal">
    <input type="hidden" name="referer" value="" />
    <div class="form-group">
        <label for="email_text" class="col-sm-4 control-label">Your login (email):</label>
        <div class="col-sm-8">
            <input type="email" class="form-control" id="email_text" value="" name="login" autofocus data-validation='{"parent":".form-group","events":["keyup","blur"],"rules":[{"name":"notblank"},{"name":"email"}]}' />
        </div>
    </div>
    <div class="form-group">
        <label for="password_text" class="col-sm-4 control-label">Password:</label>
        <div class="col-sm-8">
            <input type="password" class="form-control" id="password_text" name="password" data-validation='{"parent":".form-group","rules":[{"name":"min","min":5}]}' />
        </div>
    </div>
    <div class="form-group">
        <div class="col-sm-8 col-sm-offset-4">
            <div class="checkbox">
                <label>
                    <input type="checkbox" name="rememberme"> Remember me on this computer
                </label>
            </div>
        </div>
    </div>
    <div class="form-group">
        <div class="col-sm-offset-4 col-sm-8">
            <button type="submit" class="btn btn-default btn-lg" name="log">Log into your account</button>
            <a class="btn btn-default btn-lg mobile-show-inline-block" href="/account/create/">Create account</a>
            <a href="/account/lostpassword" class="btn btn-link btn-lg">Forgot your password?</a>
        </div>
    </div>
</form>

N.B: Don't suggest me to use selenium. I can do this with selenium and I tested that but I have to stick to requests because selenium pops up console even if I use PhantomJS.

like image 950
MD. Khairul Basar Avatar asked Jun 14 '17 15:06

MD. Khairul Basar


2 Answers

I know that this question was made long ago, but anyway, I'll propose a solution for those who are still having trouble with this: I recommend to check if the form you are trying to post takes some kind of hidden input, which the example of the question does. This is very frequent, and does sometimes prevent us from logging to a site if we do not notice it. So, let's suppose in the site there is a form like this:

<form method='post' id='signin-form' class='big-form'>
 <input type="hidden" id="whatever" name="foo" value="check">
 <input type="text" id="u" name="user">
 <input type="password" id="pwd" name="pass">
</form>

In that case, the variable login_data should be like this:

login_data = {
       "foo":"check",
       "user":"your username",
       "pass":"your password",           
}

Having done this, and provided the website does not check the headers, you should have no trouble logging to a website via the requests module.

like image 193
Half_Bit Avatar answered Oct 15 '22 14:10

Half_Bit


You might be missing some headers. I would intercept a request made by a web browser to see the things you are missing, then add these headers to your request.

You will find informations on how to do it on the official documentation, right here : http://docs.python-requests.org/en/master/user/quickstart/#custom-headers

like image 26
Anthony Rossi Avatar answered Oct 15 '22 14:10

Anthony Rossi