I have used jsoup for scraping and its works perfectly until the AJAX and JavaScript not playing their roles to display webpage content.
Now guys any clue, how to scrape those content which get displayed with AJAX or by JavaScript after page get loads completely.
You can use a headless browser as PhatomJS.
PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.
In order to ease your work, You could use CapserJS
CasperJS is a companion for PhatomJS which brings a greatly improved API to ease the creation of scraping and automation workflows.
These tools are very useful when you have to scrape a websites with dynamic content, for instance, websites where the content is displayed after it ran process in Javascript (sometimes including ajax calls).
You can see a example about how casper works here:
CasperJs and Jquery with chained Selects
You can't do it directly with JSoup. You'll need a headless browser, which is a much more complex thing. There are headless versions of Firefox, Safari, and others. Searches for "headless X" (where X is the browser engine you want to use) should turn up some useful projects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With