Trying to remove html from text with Java

Question

I've got an ArrayList<String> named fields. I'm trying to parse the HTML in each String using the replaceAll function, but I get the feeling that I'm screwing up the regex String (I got the 2nd regex here to represent a generic html expression). Can anyone give me some tips on how to correct myself here?

for(int j = 0; j<fields.size(); j++)    
{
    String k = fields.get(j);
    k.replaceAll("<br>", "
");
    k.replaceAll("<(\"[^\"]*\"|'[^']*'|[^'\">])*>", "");
    k.replaceAll("&lt;", "<");
    k.replaceAll("&gt;", ">");
    fields.set(j, k);
}

arshajii · Accepted Answer

Remember that strings are immutable, so you want to re-assign k each time you call replaceAll:

String k = fields.get(j);
k = k.replaceAll("<br>", "
");
...

Trying to remove html from text with Java

Tags:

java

html

regex

user1724159

1 Answers

arshajii

Recent Activity

Donate For Us

Trying to remove html from text with Java

Tags:

java

html

regex

user1724159

1 Answers

arshajii

Related questions

Recent Activity

Donate For Us