I have the following Java code:
str = str.replaceAll("<.*?>.*?</.*?>|<.*?/>", "");
This turns a String like so:
How now <fizz>brown</fizz> cow.
Into:
How now cow.
However, I want it to just strip the <fizz>
and </fizz>
tags, or just standalone </fizz
> tags, and leave the element's content alone. So, a regex that would turn the above into:
How now brown cow.
Or, using a more complex String, somethng that turns:
How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow.
Into:
How now brown cow.
I tried this:
str = str.replaceAll("<.*?></.*?>|<.*?/>", "");
And that doesn't work at all. Any ideas? Thanks in advance!
"How now <fizz>brown</fizz> cow.".replaceAll("<[^>]+>", "")
You were almost there ;)
Try this:
str = str.replaceAll("<.*?>", "")
While there are other correct answers, none give any explanation.
The reason your regex <.*?>.*?</.*?>|<.*?/>
doesn't work is because it will select any tags as well as everything inside them. You can see that in action on debuggex.
The reason your second attempt <.*?></.*?>|<.*?/>
doesn't work is because it will select from the beginning of a tag up to the first close tag following a tag. That is kind of a mouthful, but you can understand better what's going on in this example.
The regex you need is much simpler: <.*?>
. It simply selects every tag, ignoring if it's open/close. Visualization.
You can try this too:
str = str.replaceAll("<.*?>", "");
Please have a look at the below example for better understanding:
public class StringUtils {
public static void main(String[] args) {
System.out.println(StringUtils.replaceAll("How now <fizz>brown</fizz> cow."));
System.out.println(StringUtils.replaceAll("How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow."));
}
public static String replaceAll(String strInput) {
return strInput.replaceAll("<.*?>", "");
}
}
Output:
How now brown cow.
How now brown cow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With