I have a simple xml file and I want to remove everything before the first <item>
tag.
<sometag>
<something>
.....
</something>
<item>item1
</item>
....
</sometag>
The following java code is not working:
String cleanxml = rawxml.replace("^[\\s\\S]+<item>", "");
What is the correct way to do this? And how do I address the non-greedy issue? Sorry I'm a C# programmer.
Well, if you want to use regex, then you can use replaceAll
. This solution uses a reluctant quantifier and a backreference:
String cleanxml = rawxml.replaceAll(".*?(<item>.*)", "$1");
Alternately you can use replaceFirst
. This solution uses a positive lookahead.
String cleanxml = rawxml.replaceFirst(".*?(?=<item>)", "");
It makes more sense to just use indexOf
and substring
, though.
String cleanxml = rawxml.substring(rawxml.indexOf("<item>"));
The reason why replace
doesn't work is that neither char
nor CharSequence
overloads is regex-based. It's simple character (sequence) replacement.
Also, as others are warning you, unless you're doing processing of simple XMLs, you shouldn't use regex. You should use an actual XML parser instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With