I have been playing with the idea of using a simple screen-scraper using jQuery and I am wondering if the following is possible.
I have simple HTML page and am making an attempt (if this is possible) to grab the contents of all of the list items from another page, like so:
Main Page:
<!-- jQuery --> <script type='text/javascript'> $(document).ready(function(){ $.getJSON("[URL to other page]", function(data){ //Iterate through the <li> inside of the URL's data $.each(data.items, function(item){ $("<li/>").value().appendTo("#data"); }); }); }); </script> <!-- HTML --> <html> <body> <div id='data'></div> </body> </html>
Other Page:
//Html <body> <p><b>Items to Scrape</b></p> <ul> <li>I want to scrape what is here</li> <li>and what is here</li> <li>and here as well</li> <li>and append it in the main page</li> </ul> </body>
So, is it possible using jQuery to pull all of the list item contents from an external page and append them inside of a div?
Screen scraping has a variety of uses, both ethical and unethical. Brief examples of both include either an app for banking, for gathering data from multiple accounts for a user, or for stealing data from applications.
jQuery is a lightweight, "write less, do more", JavaScript library. The purpose of jQuery is to make it much easier to use JavaScript on your website. jQuery takes a lot of common tasks that require many lines of JavaScript code to accomplish, and wraps them into methods that you can call with a single line of code.
The program which extracts the data from websites is called a web scraper. You are going to learn to write web scrapers in JavaScript. There are mainly two parts to web scraping. Getting the data using request libraries and a headless browser.
Use $.ajax
to load the other page into a variable, then create a temporary element and use .html()
to set the contents to the value returned. Loop through the element's children of nodeType 1 and keep their first children's nodeValues. If the external page is not on your web server you will need to proxy the file with your own web server.
Something like this:
$.ajax({ url: "/thePageToScrape.html", dataType: 'text', success: function(data) { var elements = $("<div>").html(data)[0].getElementsByTagName("ul")[0].getElementsByTagName("li"); for(var i = 0; i < elements.length; i++) { var theText = elements[i].firstChild.nodeValue; // Do something here } } });
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With