Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to make jsoup wait for the complete page(skip a progress page) to load? [duplicate]

I am trying to parse a webpage and extract data using Jsoup. But the link is dynamic and throws up a wait-for-loading page before displaying the details. So the Jsoup seems to process the waiting page rather than the details page. is there anyway to make this wait till page is fully loaded?

like image 813
Thiru Avatar asked Mar 20 '16 08:03

Thiru


2 Answers

If some of the content is created dynamically once the page is loaded, then your best chance to parse the full content would be to use Selenium with JSoup:

WebDriver driver = new FirefoxDriver();
driver.get("http://stackoverflow.com/");
Document doc = Jsoup.parse(driver.getPageSource());
like image 188
Florent B. Avatar answered Nov 06 '22 21:11

Florent B.


Probably, the page in question is t generated by JavaScript in the browser (client-side). Jsoup does not interpret JavaScript, so you are out of luck. However, you could analyze the page loading in the network tab of the browser developer tools and find out which AJAX calls are made during page load. These calls also have URLs and you may get all infos you need by directly accessing them. Alternatively, you can use a real browser engine to load the page. You can use a library like selenium webdriver for that or the JavaFX webkit component if you are using Java 8.

like image 25
luksch Avatar answered Nov 06 '22 21:11

luksch