Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open webpage and parse it using JavaScript

I know JavaScript can open a link in a new window but is it possible to open a webpage without opening it in a window or displaying it to the user? What I want to do is parse that webpage for some text and use it as variables.

Is this possible without any help from server side languages? If so, please send me in a direction I can achieve this.

Thanks all

like image 574
Abs Avatar asked Feb 28 '09 11:02

Abs


People also ask

What is parseHTML in JavaScript?

parseHTML uses native methods to convert the string to a set of DOM nodes, which can then be inserted into the document. These methods do render all trailing or leading text (even if that's just whitespace).

How do you parse an element in HTML?

If you just want to parse HTML and your HTML is intended for the body of your document, you could do the following : (1) var div=document. createElement("DIV"); (2) div. innerHTML = markup; (3) result = div. childNodes; --- This gives you a collection of childnodes and should work not just in IE8 but even in IE6-7.


1 Answers

You can use an XMLHttpRequest object to do this. Here's a simple example

var req = new XMLHttpRequest();   req.open('GET', 'http://www.mydomain.com/', false);    req.send(null);   if(req.status == 200)      dump(req.responseText); 

Once loaded, you can perform your parsing/scraping by using javascript regular expressions on the req.responseText member.

More detail...

In practice you need to do a little more to get the XMLHttpRequest object in a cross platform manner, e.g.:

var ua = navigator.userAgent.toLowerCase(); if (!window.ActiveXObject)   req = new XMLHttpRequest(); else if (ua.indexOf('msie 5') == -1)   req = new ActiveXObject("Msxml2.XMLHTTP"); else   req = new ActiveXObject("Microsoft.XMLHTTP"); 

Or use a library...

Alternatively, you can save yourself all the bother and just use a library like jQuery or Prototype to take care of this for you.

Same-origin policy may bite you though...

Note that due to the same-origin policy, the page you request must be from the same domain as the page making the request. If you want to request a remote page, you will have to proxy that via a server side script.

Another possible workaround is to use Flash to make the request, which does allow cross-domain requests if the target site grants permission with a suitably configured crossdomain.xml file.

Here's a nice article on the subject of the same-origin policy:

  • Same-Origin Policy Part 1: Why we’re stuck with things like XSS and XSRF/CSRF
like image 182
Paul Dixon Avatar answered Sep 18 '22 00:09

Paul Dixon