Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex (Java) to find all characters preceded by an even number of other characters

Tags:

java

regex

I'd like to manipulate a String in Java using Regex. Goal is to find all $ signs that have an even number of \ in front of them (or none) and then add another \.

Example:

"$ Find the $ to \$ escape \\$ or not \\\$ escape \\\\$ like here $"

should result in:

"\$ Find the \$ to \$ escape \\\$ or not \\\$ escape \\\\\$ like here \$"

Rationale here is: some $ are already escaped with a \ and some escape \ might be in the string as well in the form of \\. I need to escape the remaining $.

like image 987
kongo09 Avatar asked Jan 06 '12 12:01

kongo09


1 Answers

This should do the work: replace:

(^|[^\\])(\\{2})*(?=\$)

with the whole text matched (except for the lookahead), followed by \\.

Illustration in perl:

$ perl -pe 's,(^|[^\\])(\\{2})*(?=\$),$&\\,g'
"$ Find the $ to \$ escape \\$ or not \\\$ escape \\\\$ like here $" # in...
"\$ Find the \$ to \$ escape \\\$ or not \\\$ escape \\\\\$ like here \$" # out
"\$ Find the \$ to \$ escape \\\$ or not \\\$ escape \\\\\$ like here \$" # in...
"\$ Find the \$ to \$ escape \\\$ or not \\\$ escape \\\\\$ like here \$" # out

With Java, the whole text match is $0. Sample code:

// package declaration skipped
import java.util.regex.Pattern;

public final class TestMatch
{
    private static final Pattern p
        = Pattern.compile("(^|[^\\\\])(\\\\{2})*(?=\\$)");

    public static void main(final String... args)
    {
        String input = "\"$ Find the $ to \\$ escape \\\\$ or not \\\\\\$ "
            + "escape \\\\\\\\$ like here $\"";

        System.out.println(input);

        // Apply a first time
        input = p.matcher(input).replaceAll("$0\\\\");
        System.out.println(input);

        // Apply a second time: the input should not be altered
        input = p.matcher(input).replaceAll("$0\\\\");
        System.out.println(input);
        System.exit(0);
    }
}

Output:

"$ Find the $ to \$ escape \\$ or not \\\$ escape \\\\$ like here $"
"\$ Find the \$ to \$ escape \\\$ or not \\\$ escape \\\\\$ like here \$"
"\$ Find the \$ to \$ escape \\\$ or not \\\$ escape \\\\\$ like here \$"

A little explanation about the regex used is in order:

                # begin regex:
(               # start group
    ^           # find the beginning of input,
    |           # or
    [^\\]       # one character which is not the backslash
)               # end group
                # followed by
(               # start group
    \\{2}       # exactly two backslashes
)               # end group
*               # zero or more times
                # and at that position,
(?=             # begin lookahead
    \$          # find a $
)               # end lookahead
                # end regex

To be really complete, here are the positions at which the regex engine will find matching text (symbolized with <>) and the cursor position (symbolized by |):

# Before first run:
|"$ Find the $ to \$ escape \\$ or not \\\$ escape \\\\$ like here $"
# First match
"<>|$ Find the $ to \$ escape \\$ or not \\\$ escape \\\\$ like here $"
# Second match
"$ Find the <>|$ to \$ escape \\$ or not \\\$ escape \\\\$ like here $"
# Third match
"$ Find the $ to \$ escape <\\>|$ or not \\\$ escape \\\\$ like here $"
# Fourth match
"$ Find the $ to \$ escape \\$ or not \\\$ escape <\\\\>|$ like here $"
# From now on, there is no match
like image 74
fge Avatar answered Oct 12 '22 13:10

fge