Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to keep a web crawler running?

I want to write my own web crawler in JS. I am thinking of using a node.js solution such as https://www.npmjs.com/package/js-crawler

The objective is to have a "crawl" every 10 minutes - so every 10 minutes I want my crawler to fetch data from a website.

I understand that I could write an infinite loop such as:

var keeRunning = true;
while (keepRunning) {
  // fetch data and process it every 10 minutes
}

This could will work perfectly fine if I have my computer on all the time and I am on the website.

However, if I shut down my computer, I can imagine that it will not work any more. So what kind of solution should I consider to keep a script running all the time, even when the computer is shut down?

like image 716
JohnAndrews Avatar asked Nov 09 '22 12:11

JohnAndrews


1 Answers

Use a CronJobber for scheduling when to run your script (every x minutes, or at set times, etc) and deploy your app somewhere so it will be hosted on-line on a server that never shuts down. There are plenty solutions like this where you can host your node server for free

  • C9
  • Heroku
  • Nodejitsu
like image 91
Dan Moldovan Avatar answered Nov 14 '22 22:11

Dan Moldovan