Lately I'm trying to scrape Information from a website (kicktipp) using Nodejs, the request module and cheerio. Since this site requires an authentication to view most of it's sites, I tried to login via a post request and checking if the user is logged in with the following code (I replaced the credentials with dummy data but I use real data in my actual script):
var request = require('request');
var jar = request.jar();
var request = request.defaults({
jar: jar,
followAllRedirects: true
});
var jar = request.jar();
var cheerio = require('cheerio');
request.post({
url: 'http://www.kicktipp.de/info/profil/loginaction',
headers: { 'content-type': 'application/x-www-form-urlencoded' },
method: 'post',
jar: jar,
body: '[email protected]&passwort=1234567890&_charset_=UTF-8&submitbutton=Anmelden'
}, function(err, res, body){
if(err) {
return console.error(err);
};
request.get({
url: 'http://www.kicktipp.de/',
method: 'get',
jar: jar
}, function(err, res, body) {
if(err) {
return console.error(err);
};
var $ = cheerio.load(body);
var text = $('.dropdownbox > li > a').text();
console.log(text);
var error = $('#kicktipp-content > div.messagebox.errors > p').text();
console.log(error);
var cookies = jar.getCookies('http://www.kicktipp.de/');
console.log(cookies);
});
});
The parameters send by the html-form (as inspected with the browser) looking like this:
[email protected]&passwort=1234567890&_charset_=UTF-8&submitbutton=Anmelden
With that script, my cookie jar
looks like this:
[ Cookie="JSESSIONID=F650D7F5CD6AF4F6B0944B2190EE2D29.kt213; Path=/; hostOnly=true; aAge=1ms; cAge=179ms" ]
The JSESSIONID
is saved successfully but the server will not be logged in since console.log(text)
prints Login
but it should print Logout
if the user is signed in properly.
After inspecting the login request with the browser I recognized that the browser receives a new cookie everytime a page on this domain is requested via set-cookie
in the response header like this:
Set-Cookie: login=bS5zcGxpZXRob2V2ZXJAZ21haWwuY29tOjE0NzU0MDA3MjAxMjA6Mzg1NTI4OGY3ODgzN2FkMzllNTA0NWNkY2ZjMjBjZGM; Domain=.kicktipp.de; Expires=Sun, 02-Oct-2016 09:32:00 GMT; Path=/; HttpOnly
However I'm not able (or just don't know how) to get this cookie into my request jar and therefore visiting the page as a logged in user.
Is there anything I'm missing here to stay logged in (or log in to the page at all)? Thanks in advance.
The problem is that this page seems to need a specific cookie that you get on your first page visit (in this case it seems to a timezone cookie). To get this cookie you just need to visit the page (using a GET request) before sending the login (POST) request to the server. In this case it is as easy as wrapping another GET request around the code above:
var loginLink = 'http://www.kicktipp.de/info/profil/login';
// creating a clean jar
var j = request.jar();
request.get({url: loginLink, jar: j}, function(err, httpResponse, html) {
// place POST request and rest of the code here
});
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With