Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup select is not returning all nodes

Tags:

java

jsoup

I have just started using jsoup with this site and something weird is happening.

All I want is to select the text under the column title, which you can find with the following html:

<div class="Table1_A1 grow clear-fix">
    <div class="grd-col grd-col-1a"> … </div>
    <div class="grd-col grd-col-2b">
       <p>
         <span class="T1">
                <a href="...."> TITLE TEXT IS HERE
                </a>
          </span>
        </p>
     </div>
     ...
</div>

Looking at this html structure I came up with the following for jsoup selection:

try {
  Document htmlDocument = Jsoup.connect(url).get();
  Elements as = htmlDocument.select("div.grow > div.grd-col-2b > p > span.T1 > a");
  System.out.println(as.html());

} catch (IOException e) {
    e.printStackTrace();
}

Here is the thing: It only prints out until title "ASAP", but there's loads after that, and they simply don't come up. So I am left wondering, does jsoup ".select()" have a limit on the nodes it returns? I have no idea how to come around this, any help is quite appreciated.

like image 857
Chayemor Avatar asked Dec 16 '13 12:12

Chayemor


1 Answers

Jsoup has no limitations for selects, but there is a default limitations for the body size of the request which is 1MB. This should fix the problem:

Document doc = Jsoup.connect(url).timeout(60000).maxBodySize(10*1024*1024).get();
like image 137
Andrey Chaschev Avatar answered Oct 27 '22 00:10

Andrey Chaschev