I'm trying to use NodeJS
to scrape a website that requires a login by POST
. Then once I'm logged in I can access a separate webpage by GET
.
The first problem right now is logging in. I've tried to use request
to POST
the login information, but the response I get does not appear to be logged in.
exports.getstats = function (req, res) { request.post({url : requesturl, form: lform}, function(err, response, body) { res.writeHeader(200, {"Content-Type": "text/html"}); res.write(body); res.end(); }); };
Here I'm just forwarding the page I get back, but the page I get back still shows the login form, and if I try to access another page it says I'm not logged in.
I think I need to maintain the client side session and cookie data, but I can find no resources to help me understand how to do that.
As a followup I ended up using zombiejs to get the functionality I needed
This cookie will contain the session's unique id stored on the server, which will now be stored on the client. This cookie will be sent on every request to the server. We use this session ID and look up the session saved in the database or the session store to maintain a one-to-one match between a session and a cookie.
Session management can be done in node. js by using the express-session module. It helps in saving the data in the key-value form. In this module, the session data is not saved in the cookie itself, just the session ID.
Here, since sess is global, the session won't work for multiple users as the server will create the same session for all the users. This can be solved by using what is called a session store. We have to store every session in the store so that each one will belong to only a single user.
Express Sessions are used in a Node js web application to maintain the state of a user. To install express-session, type the npm install express-session –save command in your terminal or command-line tools. The most used case scenario of the session is Authentication System.
You need to make a cookie jar and use the same jar for all related requests.
var cookieJar = request.jar(); request.post({url : requesturl, jar: cookieJar, form: lform}, ...
That should in theory allow you to scrape pages with GET as a logged-in user, but only once you get the actual login code working. Based on your description of the response to your login POST, that may not be actually working correctly yet, so the cookie jar won't help until you fix the problems in your login code first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With