//Method for Strip HTML
public static String stripHtml(String inStr) {
boolean inTag = false;
char c;
StringBuffer outStr = new StringBuffer();
int len = inStr.length();
for (int i = 0; i < len; i++) {
c = inStr.charAt(i);
if (c == '<') {
inTag = true;
}
if (!inTag) {
outStr.append(c);
}
if (c == '>') {
inTag = false;
}
}
//Print to show that the this method is removing the necessary characters
System.out.println(outStr);
return outStr.toString();
}
So I need all outputs containing <> to be cleansed and everything in between it, and it should still print out the remaining characters. for instance
input:app<html>le
expected:apple
however it should also remove if it finds just "<" or ">" but my method isn't doing so.
input:app<le
output:app<le
expected:apple
please let me know what to fix.
Try parsing HTML using an HTML parser like JSoup or TagSoup.
Once you have the DOM, on the root element just call getTextContent().
From the API documentation (never versions of Java act the same): This attribute returns the text content of this node and its descendants. [...] no serialization is performed, the returned string does not contain any markup.
See also
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With