How can I remove these:
<td> </td>
or
<td width="7%"> </td>
from my JSoup 'Document'? I've tried many methods, but these non-breaking space characters do not match anything with normal JSoup expressions or Selectors.
The HTML entity
(Unicode character NO-BREAK SPACE U+00A0) can in Java be represented by the character \u00a0
. Assuming that you want to remove every element which contains that character as own text (and thus not every line as you said in a comment), then the following ought to work:
document.select(":containsOwn(\u00a0)").remove();
If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With