Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple java regular expression replace question

Tags:

java

regex

I have a simple xml file and I want to remove everything before the first <item> tag.

<sometag>
  <something>
   .....
  </something>
  <item>item1
  </item>
  ....
</sometag>

The following java code is not working:

String cleanxml = rawxml.replace("^[\\s\\S]+<item>", "");

What is the correct way to do this? And how do I address the non-greedy issue? Sorry I'm a C# programmer.

like image 424
Yang Avatar asked Mar 30 '10 17:03

Yang


1 Answers

Well, if you want to use regex, then you can use replaceAll. This solution uses a reluctant quantifier and a backreference:

String cleanxml = rawxml.replaceAll(".*?(<item>.*)", "$1");

Alternately you can use replaceFirst. This solution uses a positive lookahead.

String cleanxml = rawxml.replaceFirst(".*?(?=<item>)", "");

It makes more sense to just use indexOf and substring, though.

String cleanxml = rawxml.substring(rawxml.indexOf("<item>"));

The reason why replace doesn't work is that neither char nor CharSequence overloads is regex-based. It's simple character (sequence) replacement.


Also, as others are warning you, unless you're doing processing of simple XMLs, you shouldn't use regex. You should use an actual XML parser instead.

like image 61
polygenelubricants Avatar answered Oct 19 '22 20:10

polygenelubricants