I want to use nodeJS as tool for website scrapping. I have already implemented a script which logs me in on the system and parse some data from the page.
The steps are defined like:
Open login page
Enter login data
Submit login form
Go to desired page
Grab and parse values from the page
Save data to file
Exit
Obviously, the problem is that every time my script has to login, and I want to eliminate that. I want to implement some kind of cookie management system, where I can save cookies to .txt file, and then during next request I can load cookies from file and send it in request headers.
This kind of cookie management system is not hard to implement, but the problem is how to access cookies in nodejs? The only way I found it is using request response object, where you can use something like this:
request.get({headers:requestHeaders,uri: user.getLoginUrl(),followRedirect: true,jar:jar,maxRedirects: 10,},function(err, res, body) {
if(err) {
console.log('GET request failed here is error');
console.log(res);
}
//Get cookies from response
var responseCookies = res.headers['set-cookie'];
var requestCookies='';
for(var i=0; i<responseCookies.length; i++){
var oneCookie = responseCookies[i];
oneCookie = oneCookie.split(';');
requestCookies= requestCookies + oneCookie[0]+';';
}
}
);
Now content of variable requestCookies
can be saved to the .txt file and can loaded next time when script is executed, and this way you can avoid process of logging in user every time when script is executed.
Is this the right way, or there is a method which returns cookies?
NOTE: If you want to setup your request
object to automatically resend received cookies on every subsequent request, use the following line during object creation:
var request = require("request");
request = request.defaults({jar: true});//Send cookies on every subsequent requests
Another route is /getcookie which is used to get all the cookies and show them on the webpage. At the end of the code, we are listening to 3000 port for our server to be able to run. This will run the server as shown in the image above. We can check cookies by visiting localhost:3000/setcookie.
The Set-Cookie HTTP response header is used to send a cookie from the server to the user agent, so that the user agent can send it back to the server later. To send multiple cookies, multiple Set-Cookie headers should be sent in the same response.
Now to use cookies with Express, we will require the cookie-parser. cookie-parser is a middleware which parses cookies attached to the client request object. To use it, we will require it in our index. js file; this can be used the same way as we use other middleware.
Once you have a cookie, the browser can send back the cookie to the backend. This could have a number of applications: user tracking, personalization, and most important, authentication. To properly identify you on each subsequent request, the backend checks the cookie coming from the browser in the request.
In my case, i've used 'http'library like the following:
http.get(url, function(response) {
variable = response.headers['set-cookie'];
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With