Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup Delay due to Streaming Website

A question regarding Jsoup: I am building a tool that fetches prices from a website. However, this website has streaming content. If I browse manually, I see the prices of 20 mins ago and have to wait about 3 secs to get the current price. Is there any way I can make some kind of delay in Jsoup to be able to obtain the prices in the streaming section? I am using this code:

conn = Jsoup.connect(link).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36");

conn.timeout(5000);

doc = conn.get();
like image 660
Wouter Avatar asked Oct 17 '13 16:10

Wouter


2 Answers

As mentioned in the comments, the site is most likely using some type of scripting that just won't work with Jsoup. Since Jsoup just get the initial HTML response and does not execute any javascript.

I wanted to give you some more guidence though on where to go now. The best bet, in these cases, is to move to another platform for these types of sites. You can migrate to HTMLUnit which is a headless browser, or Selenium which can use HTMLUnit or a real browser like Firefox or Chrome. I would recommend Selenium if you think you will ever need to move past HTMLUnit as HTMLUnit can sometimes be less stable a browser compared to consumer browsers Selenium can support. You can use Selenium with the HTMLUnit driver giving you the option to move to another browser seamlessly later.

like image 113
Andrew Phillips Avatar answered Nov 02 '22 00:11

Andrew Phillips


You can Use a JavaFX WebView with javascript enabled. After waiting the two seconds, you can extract the contents and pass them to JSoup.

(After loading your url into your WebView using the example above)
String text=view.getEngine() executeScript("document.documentElement.outerHTML");
Document doc = Jsoup.parse(html);
like image 1
Emily Crutcher Avatar answered Nov 02 '22 00:11

Emily Crutcher