Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple Screen Scraping using jQuery

I have been playing with the idea of using a simple screen-scraper using jQuery and I am wondering if the following is possible.

I have simple HTML page and am making an attempt (if this is possible) to grab the contents of all of the list items from another page, like so:

Main Page:

<!-- jQuery --> <script type='text/javascript'> $(document).ready(function(){ $.getJSON("[URL to other page]",   function(data){      //Iterate through the <li> inside of the URL's data     $.each(data.items, function(item){       $("<li/>").value().appendTo("#data");     });    }); }); </script>  <!-- HTML --> <html>     <body>        <div id='data'></div>     </body> </html> 

Other Page:

//Html <body>     <p><b>Items to Scrape</b></p>        <ul>         <li>I want to scrape what is here</li>         <li>and what is here</li>         <li>and here as well</li>         <li>and append it in the main page</li>     </ul> </body> 

So, is it possible using jQuery to pull all of the list item contents from an external page and append them inside of a div?

like image 589
Rion Williams Avatar asked Apr 14 '11 18:04

Rion Williams


People also ask

What is screen scraping example?

Screen scraping has a variety of uses, both ethical and unethical. Brief examples of both include either an app for banking, for gathering data from multiple accounts for a user, or for stealing data from applications.

Why use jQuery?

jQuery is a lightweight, "write less, do more", JavaScript library. The purpose of jQuery is to make it much easier to use JavaScript on your website. jQuery takes a lot of common tasks that require many lines of JavaScript code to accomplish, and wraps them into methods that you can call with a single line of code.

What is screen scraping JavaScript?

The program which extracts the data from websites is called a web scraper. You are going to learn to write web scrapers in JavaScript. There are mainly two parts to web scraping. Getting the data using request libraries and a headless browser.


1 Answers

Use $.ajax to load the other page into a variable, then create a temporary element and use .html() to set the contents to the value returned. Loop through the element's children of nodeType 1 and keep their first children's nodeValues. If the external page is not on your web server you will need to proxy the file with your own web server.

Something like this:

$.ajax({      url: "/thePageToScrape.html",      dataType: 'text',      success: function(data) {           var elements = $("<div>").html(data)[0].getElementsByTagName("ul")[0].getElementsByTagName("li");           for(var i = 0; i < elements.length; i++) {                var theText = elements[i].firstChild.nodeValue;                // Do something here           }      } }); 
like image 199
Ry- Avatar answered Sep 22 '22 00:09

Ry-