I need to scrape website that requires login. I'm trying to create a session
and login as I have to scrape different pages after logging in. But can't find out why it's not working.
import requests
from bs4 import BeautifulSoup
login_data = {
"log":"login",
"login":"my email",
"password":"my password"
}
session = requests.session()
session.post(login_url, data=login_data)
response = session.get(url)
html = response.text
soup = BeautifulSoup(html, "html.parser")
print(soup.title.get_text())
Title shows it's not working.
Here is the website form.
<form method="post" id="signin-form" class="form-horizontal">
<input type="hidden" name="referer" value="" />
<div class="form-group">
<label for="email_text" class="col-sm-4 control-label">Your login (email):</label>
<div class="col-sm-8">
<input type="email" class="form-control" id="email_text" value="" name="login" autofocus data-validation='{"parent":".form-group","events":["keyup","blur"],"rules":[{"name":"notblank"},{"name":"email"}]}' />
</div>
</div>
<div class="form-group">
<label for="password_text" class="col-sm-4 control-label">Password:</label>
<div class="col-sm-8">
<input type="password" class="form-control" id="password_text" name="password" data-validation='{"parent":".form-group","rules":[{"name":"min","min":5}]}' />
</div>
</div>
<div class="form-group">
<div class="col-sm-8 col-sm-offset-4">
<div class="checkbox">
<label>
<input type="checkbox" name="rememberme"> Remember me on this computer
</label>
</div>
</div>
</div>
<div class="form-group">
<div class="col-sm-offset-4 col-sm-8">
<button type="submit" class="btn btn-default btn-lg" name="log">Log into your account</button>
<a class="btn btn-default btn-lg mobile-show-inline-block" href="/account/create/">Create account</a>
<a href="/account/lostpassword" class="btn btn-link btn-lg">Forgot your password?</a>
</div>
</div>
</form>
N.B: Don't suggest me to use selenium
. I can do this with selenium
and I tested that but I have to stick to requests
because selenium
pops up console even if I use PhantomJS
.
I know that this question was made long ago, but anyway, I'll propose a solution for those who are still having trouble with this: I recommend to check if the form you are trying to post takes some kind of hidden input, which the example of the question does. This is very frequent, and does sometimes prevent us from logging to a site if we do not notice it. So, let's suppose in the site there is a form like this:
<form method='post' id='signin-form' class='big-form'>
<input type="hidden" id="whatever" name="foo" value="check">
<input type="text" id="u" name="user">
<input type="password" id="pwd" name="pass">
</form>
In that case, the variable login_data
should be like this:
login_data = {
"foo":"check",
"user":"your username",
"pass":"your password",
}
Having done this, and provided the website does not check the headers, you should have no trouble logging to a website via the requests module.
You might be missing some headers. I would intercept a request made by a web browser to see the things you are missing, then add these headers to your request.
You will find informations on how to do it on the official documentation, right here : http://docs.python-requests.org/en/master/user/quickstart/#custom-headers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With