I am trying to clean a string of all non-word character except when it is & i.e. pattern might be like &[\w]+;
For example:
abc; => abc
abc & => abc &
abc& => abc
if i use string.replaceAll("\W","")
it removes ;
and '&'
too from second example which I don't want.
Can using negative look-ahead in this problem could give a quick solution regex pattern?
First of all, I really like the question. Now, what you want could not be done with a single replaceAll
, because for that, we would need a negative look-behind
with variable length, which is not allowed. If it was allowed, then it would not have been that difficult.
Anyways, since single replaceAll
is no option here, you can use a little hack here. Like first replacing the last semi-colon
of you entity reference
, with some character sequence, which you are sure won't be there in the rest of the string, like XXX
or anything. I know this is not correct, but you sure can't help it out.
So, here's what you can try:
String str = "a;b&c &";
str = str.replaceAll("(&\\w+);", "$1XXX")
.replaceAll("&(?!\\w+?XXX)|[^\\w&]", "")
.replaceAll("(&\\w+)XXX", "$1;");
System.out.println(str);
Explanation:
&
with &XXX
, or any other sequence replaced for last ;
.&
not followed by \\w+XXX
, or any non-word, non &
character. This will replace all the &'s
which are not a part of &
kind of pattern. Plus, also replaces any other non-word character.XXX
with ;
, to create back &
from &XXX
And to make it easier to understand, you can rather use Pattern
and Matcher
classes and I would always prefer to use them whenever the replacement criteria is complex.
String str = "a;b&c &";
Pattern pattern = Pattern.compile("&\\w+;|[^\\w]");
Matcher matcher = pattern.matcher(str);
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
String match = matcher.group();
if (!match.matches("&\\w+;")) {
matcher.appendReplacement(sb, "");
} else {
matcher.appendReplacement(sb, match);
}
}
matcher.appendTail(sb);
System.out.println(sb.toString());
This one is similar to @Eric's code, but is a generalization over it. That one will only work for &
of course if it was improved to remove NullPointerException
that is thrown in it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With