I want to transforme all "*" into ".*" excepte "\*"
String regex01 = "\\*toto".replaceAll("[^\\\\]\\*", ".*");
assertTrue("*toto".matches(regex01));// True
String regex02 = "toto*".replaceAll("[^\\\\]\\*", ".*");
assertTrue("tototo".matches(regex02));// True
String regex03 = "*toto".replaceAll("[^\\\\]\\*", ".*");
assertTrue("tototo".matches(regex03));// Error
If the "*" is the first character a error occure : java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
What is the correct regex ?
This is currently the only solution capable of dealing with multiple escaped \
in a row:
String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*");
Let's print the string regex
to have a look at the actual string being parsed by the regex engine:
\G((?:[^\\*]|\\[\\*])*)[*]
((?:[^\\*]|\\[\\*])*)
matches a sequence of characters not \
or *
, or escape sequence \\
or \*
. We match all the characters that we don't want to touch, and put it in a capturing group so that we can put it back.
The above sequence is followed by an unescaped asterisk, as described by [*]
.
In order to make sure that we don't "jump" when the regex can't match an unescaped *
, \G
is used to make sure the next match can only start at the beginning of the string, or from where the last match ends.
Why such a long solution? It is necessary, since the look-behind construct to check whether the number of consecutive \
preceding a *
is odd or even is not officially supported by Java regex. Therefore, we need to consume the string from left to right, taking into account escape sequences, until we encounter an unescaped *
and replace it with .*
.
String inputs[] = {
"toto*",
"\\*toto",
"\\\\*toto",
"*toto",
"\\\\\\\\*toto",
"\\\\*\\\\\\*\\*\\\\\\\\*"};
for (String input: inputs) {
String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*");
System.out.println(input);
System.out.println(Pattern.compile(regex));
System.out.println();
}
toto*
toto.*
\*toto
\*toto
\\*toto
\\.*toto
*toto
.*toto
\\\\*toto
\\\\.*toto
\\*\\\*\*\\\\*
\\.*\\\*\*\\\\.*
You need to use negative lookbehind here:
String regex01 = input.replaceFirst("(?<!\\\\)\\*", ".*");
(?<!\\\\)
is a negative lookbehind that means match *
if it is not preceded by a backslash.
Examples:
regex01 = "\\*toto".replaceAll("(?<!\\\\)\\*", ".*");
//=> \*toto
regex01 = "*toto".replaceAll("(?<!\\\\)\\*", ".*");
//=> .*toto
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With