Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding all pages on domain with NodeJS

I'm trying to find all the pages on a domain with Node. I was searching on Stackoverflow, but all i found is this thread for Ruby: Find all the web pages in a domain and its subdomains - I have the same question, but for Node. I've also googled the question, but all I find are scrapers that do not find the links to scrape themselves. I was also searching for something like "sitemap generator", "webpage robot", "automatic scraper", "getting all pages on domain with Node" but it didn't bring any result.

I have a scraper that needs an array of links it will be processing and for example I have a page www.example.com/products/ where I want to find all existing sub-pages, e.g. www.example.com/products/product1.html, www.example.com/products/product2.html etc.

Could you give me a hint how can I implement it in Node?

like image 418
Jevgeni Jostin Avatar asked Feb 07 '26 16:02

Jevgeni Jostin


1 Answers

Have a look at Crawler (https://www.npmjs.org/package/crawler). You can use it to crawl the website and save the links.

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

like image 90
LiamB Avatar answered Feb 09 '26 11:02

LiamB



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!