Is there a good test suite or tool set that can automate website navigation -- with Javascript support -- and collect the HTML from the pages?
Of course I can scrape straight HTML with BeautifulSoup. But this does me no good for sites that require Javascript. :)
Whether it's a web or mobile application, JavaScript now has the right tools. This article will explain how the vibrant ecosystem of NodeJS allows you to efficiently scrape the web to meet most of your requirements.
The program which extracts the data from websites is called a web scraper. You are going to learn to write web scrapers in JavaScript. There are mainly two parts to web scraping. Getting the data using request libraries and a headless browser.
You could use Selenium or Watir to drive a real browser.
Ther are also some JavaScript-based headless browsers:
Personally, I'm most familiar with Selenium, which has support for writing automation scripts in a good number of languagues and has more mature tooling, such as the excellent Selenium IDE extension for Firefox, which can be used to write and run testcases, and can export test scripts to many languages.
Using HtmlUnit is also a possibility.
HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.
It is typically used for testing purposes or to retrieve information from web sites.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With