Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to scrape AJAX loaded content with jsoup [closed]

I have used jsoup for scraping and its works perfectly until the AJAX and JavaScript not playing their roles to display webpage content.

Now guys any clue, how to scrape those content which get displayed with AJAX or by JavaScript after page get loads completely.

like image 988
Pankaj Wanjari Avatar asked Nov 18 '25 00:11

Pankaj Wanjari


2 Answers

You can use a headless browser as PhatomJS.

PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

In order to ease your work, You could use CapserJS

CasperJS is a companion for PhatomJS which brings a greatly improved API to ease the creation of scraping and automation workflows.

These tools are very useful when you have to scrape a websites with dynamic content, for instance, websites where the content is displayed after it ran process in Javascript (sometimes including ajax calls).

You can see a example about how casper works here:
CasperJs and Jquery with chained Selects

like image 100
Hemerson Varela Avatar answered Nov 19 '25 14:11

Hemerson Varela


You can't do it directly with JSoup. You'll need a headless browser, which is a much more complex thing. There are headless versions of Firefox, Safari, and others. Searches for "headless X" (where X is the browser engine you want to use) should turn up some useful projects.

like image 24
T.J. Crowder Avatar answered Nov 19 '25 13:11

T.J. Crowder



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!