I've come across many tutorials explaining how to scrape public websites that don't require authentication/login, using node.js.
Can somebody explain how to scrape sites that require login using node.js?
Web scraping is the process of extracting data from a website in an automated way and Node. js can be used for web scraping. Even though other languages and frameworks are more popular for web scraping, Node. js can be utilized well to do the job too.
Use Mikeal's Request library, you need to enable cookies support like this:
var request = request.defaults({jar: true})
So you first should create a username on that site (manually) and pass the username and the password as params when making the POST request to that site. After that the server will respond with a cookie which Request will remember, so you will be able to access the pages that require you to be logged into that site.
Note: this approach doesn't work if something like reCaptcha is used on the login page.
I've been working with NodeJs Scrapers for more than 2 years now
I can tell you that the best choice when dealing with logins and authentication is to NOT use direct request
That is because you just waste time on building manual requests and it is way slower,
Instead, use a high lever browser that you control via an API like Puppeteer or NightmareJs
I have a good starter and in-depth guide on How to start scraping with Puppeteer, I'm sure it will help!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With