Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove non-breaking spaces from a JSoup 'Document'?

How can I remove these:

<td>&nbsp;</td>

or

<td width="7%">&nbsp;</td>

from my JSoup 'Document'? I've tried many methods, but these non-breaking space characters do not match anything with normal JSoup expressions or Selectors.

like image 284
Nick Betcher Avatar asked Aug 12 '11 01:08

Nick Betcher


1 Answers

The HTML entity &nbsp; (Unicode character NO-BREAK SPACE U+00A0) can in Java be represented by the character \u00a0. Assuming that you want to remove every element which contains that character as own text (and thus not every line as you said in a comment), then the following ought to work:

document.select(":containsOwn(\u00a0)").remove();

If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line.

like image 158
BalusC Avatar answered Sep 30 '22 20:09

BalusC