Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a web crawler with Node.js? [closed]

I just recently got interested in how search engines work, and I found out that they use "bots" or "webcrawlers". I immediately started wondering about how do these things work and I wanted to create one! So, first of: how do you make a program that requests a page from a server? It would be awesome if you gave me a simple example in JavaScript (I'm running it as a normal scripting language using Node). Next, is there a Node module that let's me interpret HTML? Create a DOM for me so I can cycle trough all the links and so on? Correct me if I'm wrong but I guess it's done like that... Any examples in C++, C or Python are warmly welcomed as well, although I'd prefer JS or Python because I'm more familiar with high-level scripting languages.

like image 381
corazza Avatar asked Oct 09 '22 03:10

corazza


1 Answers

  • Getting HTTP pages: node http.get (example is there)
  • DOM documents: jsdom (also includes examples)
like image 152
Tom van der Woerdt Avatar answered Oct 12 '22 03:10

Tom van der Woerdt