How can I remove non-breaking spaces from a JSoup 'Document'?

Question

How can I remove these:

<td>&nbsp;</td>

or

<td width="7%">&nbsp;</td>

from my JSoup 'Document'? I've tried many methods, but these non-breaking space characters do not match anything with normal JSoup expressions or Selectors.

BalusC · Accepted Answer

The HTML entity   (Unicode character NO-BREAK SPACE U+00A0) can in Java be represented by the character \u00a0. Assuming that you want to remove every element which contains that character as own text (and thus not every line as you said in a comment), then the following ought to work:

document.select(":containsOwn(\u00a0)").remove();

If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line.

document.select(":containsOwn(\u00a0)").remove();

If you really mean to remove the entire line then your best bet is really to scan HTML yourself line by line.

How can I remove non-breaking spaces from a JSoup 'Document'?

Tags:

java

html

html-entities

jsoup

Nick Betcher

1 Answers

BalusC

Recent Activity

Donate For Us

How can I remove non-breaking spaces from a JSoup 'Document'?

Tags:

java

html

html-entities

jsoup

Nick Betcher

1 Answers

BalusC

Related questions

Recent Activity

Donate For Us