I want to crawl the page and check for the hyperlinks in that respective page and also follow those hyperlinks and capture data from the page

Generally, browser JavaScript can only crawl within the domain of its origin, because fetching pages would be done via Ajax, which is restricted by the Same-Origin Policy. If the page running the crawler script is on www.example.com, then that script can crawl all the pages on www.example.com, but not the pages of any other origin (unless some edge case applies, e.g., the Access-Control-Allow-Origin header is set for pages on the other server). If you really want to write a fully-featured crawler in browser JS, you could write a browser extension: for example, Chrome extensions are packaged Web application run with special permissions, including cross-origin Ajax. The difficulty with this approach is that you'll have to write multiple versions of the crawler if you want to support multiple browsers. (If the crawler is just for personal use, that's probably not an issue.)

If you use server-side javascript it is possible. You should take a look at node.js And an example of a crawler can be found in the link bellow: http://www.colourcoding.net/blog/archive/2010/11/20/a-node.js-web-spider.aspx

is it possible to write web crawler in javascript?

2 Answers

Generally, browser JavaScript can only crawl within the domain of its origin, because fetching pages would be done via Ajax, which is restricted by the Same-Origin Policy.

If the page running the crawler script is on www.example.com, then that script can crawl all the pages on www.example.com, but not the pages of any other origin (unless some edge case applies, e.g., the Access-Control-Allow-Origin header is set for pages on the other server).

If you really want to write a fully-featured crawler in browser JS, you could write a browser extension: for example, Chrome extensions are packaged Web application run with special permissions, including cross-origin Ajax. The difficulty with this approach is that you'll have to write multiple versions of the crawler if you want to support multiple browsers. (If the crawler is just for personal use, that's probably not an issue.)

133

answered Oct 12 '22 11:10

apsillers

If you use server-side javascript it is possible. You should take a look at node.js

And an example of a crawler can be found in the link bellow:

http://www.colourcoding.net/blog/archive/2010/11/20/a-node.js-web-spider.aspx

answered Oct 12 '22 12:10

Bogdan Emil Mariesan

Related questions
                            
                                Lodash: Filter array of objects and check if not null
                            
                                JavaScript endsWith is not working in IEv10?
                            
                                Passing data through open modal function Angular uibModal
                            
                                Apply click method to outer div but not inner div [duplicate]
                            
                                Right way to clone objects / arrays during setState in React
                            
                                Module build failed (from ./node_modules/babel-loader/lib/index.js): Error: Cannot find module 'babel-preset-react'
                            
                                How can I check if selector exists in puppeteer?
                            
                                Javascript access TR from TD
                            
                                How do you assign a JavaScript 'onclick' attribute dynamically?
                            
                                How to reverse the order in a FOR loop
                            
                                How can I get the element in which highlighted text is in?
                            
                                Possible to find out whether a user is logged into facebook over javascript API?
                            
                                Inline onclick JavaScript variable
                            
                                how to reload/refresh/reinit DynaTree?
                            
                                CSS - Extra background image for when the first image doesn't load?
                            
                                Add dots/ellipsis on div/span element overflow without using jquery
                            
                                Get all href links in DOM
                            
                                Round a variable up to the next closest multiple of X
                            
                                Javascript: Getting all existing keys in a JSON array
                            
                                How do I correctly use setInterval and clearInterval to switch between two different functions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

is it possible to write web crawler in javascript?

Tags:

javascript

web-crawler

Ashwin Mendon

People also ask

2 Answers

apsillers

Bogdan Emil Mariesan

Recent Activity

Donate For Us