Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex to strip out XML tags, but not tag contents

I have the following Java code:

str = str.replaceAll("<.*?>.*?</.*?>|<.*?/>", "");

This turns a String like so:

How now <fizz>brown</fizz> cow.

Into:

How now  cow.

However, I want it to just strip the <fizz> and </fizz> tags, or just standalone </fizz> tags, and leave the element's content alone. So, a regex that would turn the above into:

How now brown cow.

Or, using a more complex String, somethng that turns:

How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow.

Into:

How now brown cow.

I tried this:

str = str.replaceAll("<.*?></.*?>|<.*?/>", "");

And that doesn't work at all. Any ideas? Thanks in advance!

like image 237
IAmYourFaja Avatar asked Apr 02 '13 16:04

IAmYourFaja


4 Answers

"How now <fizz>brown</fizz> cow.".replaceAll("<[^>]+>", "")
like image 57
Sam Barnum Avatar answered Nov 18 '22 00:11

Sam Barnum


You were almost there ;)

Try this:

str = str.replaceAll("<.*?>", "")
like image 42
TheEwook Avatar answered Nov 17 '22 23:11

TheEwook


While there are other correct answers, none give any explanation.

The reason your regex <.*?>.*?</.*?>|<.*?/> doesn't work is because it will select any tags as well as everything inside them. You can see that in action on debuggex.

The reason your second attempt <.*?></.*?>|<.*?/> doesn't work is because it will select from the beginning of a tag up to the first close tag following a tag. That is kind of a mouthful, but you can understand better what's going on in this example.

The regex you need is much simpler: <.*?>. It simply selects every tag, ignoring if it's open/close. Visualization.

like image 45
Sergiu Toarca Avatar answered Nov 18 '22 00:11

Sergiu Toarca


You can try this too:

str = str.replaceAll("<.*?>", "");

Please have a look at the below example for better understanding:

public class StringUtils {

    public static void main(String[] args) {
        System.out.println(StringUtils.replaceAll("How now <fizz>brown</fizz> cow."));
        System.out.println(StringUtils.replaceAll("How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow."));
    }

    public static String replaceAll(String strInput) {
        return strInput.replaceAll("<.*?>", "");
    }
}

Output:

How now brown cow.
How now brown cow.
like image 2
1218985 Avatar answered Nov 17 '22 23:11

1218985