Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

jsoup quotes and spaces

Tags:

jsoup

I am trying to pick, using Jsoup, the paragraph inside the following HTML snippet:

<div class="abc ">
<p class="de">Very short paragraph.</p>
</div>

For that, I am using the following Java code snippet:

Elements divs = document.select("div[class=abc ]");
for (Element div : divs) {
  Log.v("iwashere", String.format("div[class=abc ]"));
  Elements ppp = document.select("p[class=de]");                   
  for (Element p : ppp) {
    Log.v("iwashere", p.text());
    break;                                                
  } 
}

The problem is that, for some reason, Jsoup doesn't seem to pick up the "div[class=abc ]" (the Log.v("iwashere") never shows up in the log.

At first, I thought that the trailing space may be a problem, so I also tried

Elements divs = document.select("div[class=abc]");

but that didn't help either.

What could be the problem in the above code?

like image 538
Regex Rookie Avatar asked Apr 07 '11 04:04

Regex Rookie


2 Answers

jsoup use css selectors. You want to use "div.abc", which means a div with a class of abc.

Element divs = document.select("div.abc");
like image 71
Richard Schneider Avatar answered Nov 06 '22 17:11

Richard Schneider


There is a bug with the whitespace at the end of "abc*_*" (where "_" is a whitespace).

Elements divs = document.select("div[class=abc ]");

That's why it's working with the css selectors (div.abc).

like image 22
pachuss Avatar answered Nov 06 '22 17:11

pachuss