Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A way to estimate or predict Jsoup processing time of a chunk of HTML?

Some web pages that I process in Jsoup are heavy. By "heavy" I mean the page either contains lots of HTML (let's assume the page has already been downloaded), or it requires several iterations on the same document (created only once via Jsoup.parse()).

For that reason, I would like to present to the user a progress bar with a guesstimate of how much time is left.

One approach is to just measure the volume of HTML (in KB or MB) and come up with a speed factor (unfortunately, totally dependent on speed of the system this code runs on).

Another approach is to count the number of nodes?

Due to the obvious in-deterministic nature of this, am I calling for trouble?

Ideas of better ways to handles this?

like image 711
ih8ie8 Avatar asked Jul 02 '12 09:07

ih8ie8


1 Answers

Summarizing the answers so far: No, it is not possible to estimate or predict Jsoup processing time of a chunk of HTML.

The reason is, aside from the fact that Jsoup.parse() is the time-consuming component, Jsoup can run on many platforms/devices, some are extremely slow, some are very fast and there isn't a way (wait) for Jsoup to correlate its processing stages/operations with the architecture on which it runs.

like image 69
ih8ie8 Avatar answered Nov 05 '22 23:11

ih8ie8