I'm trying to login a website for some scraping using Python and requests library, I am trying the following (which doesn't work):
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
payload = {'username':'niceusername','password':'123456'}
In [12]: r = requests.post('https://admin.example.com/login.php',headers=headers,data=payload)
But nada, getting a redirect to the login page. Do I need to open a session? am I doing a wrong POST request, do I need to load the cookies? or does session does that automatically? I am lost here, some help and explanations are needed.
The website I'm trying to login is php, do I need to "capture the set-cookie and set the cookie header"? if so I have no idea how to do it. The webpage is a form with the following (if it helps): input :username' 'password' 'id':'myform', 'action':"login.php
Some extra information, maybe you can see what I'm missing here..
In [13]: r.headers
Out[13]: CaseInsensitiveDict({'content-encoding': 'gzip', 'transfer-encoding': 'chunked',
'set-cookie': 'PHPSESSID=v233mnt4malhed55lrpc5bp8o1; path=/',
'expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'vary': 'Accept-Encoding', 'server': 'nginx',
'connection': 'keep-alive', 'pragma': 'no-cache',
'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0',
'date': 'Tue, 24 Dec 2013 10:50:44 GMT', 'content-type': 'text/html'})
In [14]: r.cookies
Out[14]: <<class 'requests.cookies.RequestsCookieJar'>[Cookie(version=0, name='PHPSESSID',
value='v233mnt4malhed55lrpc5bp8o1', port=None, port_specified=False, domain='admin.example.com',
domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False,
expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False)]>
I would really appreciate the help, thanks!
update, with answer thanks to atupal:
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
payload = {'username':'usr','pass':'123'}
link = 'https://admin.example.com/login.php'
session = requests.Session()
resp = session.get(link,headers=headers)
# did this for first to get the cookies from the page, stored them with next line:
cookies = requests.utils.cookiejar_from_dict(requests.utils.dict_from_cookiejar(session.cookies))
resp = session.post(link,headers=headers,data=payload,cookies =cookies)
#used firebug to check POST data, password, was actually 'pass', under 'net' in param.
#and to move forward from here after is:
session.get(link)
You can use the Session object
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
payload = {'username':'niceusername','password':'123456'}
session = requests.Session()
session.post('https://admin.example.com/login.php',headers=headers,data=payload)
# the session instance holds the cookie. So use it to get/post later.
# e.g. session.get('https://example.com/profile')
Send a POST request with content type = 'form-data':
import requests
files = {
'username': (None, 'myusername'),
'password': (None, 'mypassword'),
}
response = requests.post('https://example.com/abc', files=files)
I was having problems here (i.e. sending form-data whilst uploading a file) until I used the following:
files = {'file': (filename, open(filepath, 'rb'), 'text/xml'),
'Content-Disposition': 'form-data; name="file"; filename="' + filename + '"',
'Content-Type': 'text/xml'}
That's the input that ended up working for me. In Chrome Dev Tools -> Network tab, I clicked the request I was interested in. In the Headers tab, there's a Form Data section, and it showed both the Content-Disposition and the Content-Type headers being set there.
I did NOT need to set headers in the actual requests.post() command for this to succeed (including them actually caused it to fail)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With