Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to Retrieve Quoted String and Quote Character

Tags:

java

regex

I have a language that defines a string as being delimited by either single or double quotes, where the delimiter is escaped within the string by doubling it. For example, all of the following are legal strings:

'This isn''t easy to parse.'
'Then John said, "Hello Tim!"'
"This isn't easy to parse."
"Then John said, ""Hello Tim!"""

I have a collection of strings (defined above), delimited by something that doesn't contain a quote. What I am attempting to do using regular expressions, is to parse each string in a list out. For example, here is an input:

"Some String #1" OR 'Some String #2' AND "Some 'String' #3" XOR
'Some "String" #4' HOWDY "Some ""String"" #5" FOO 'Some ''String'' #6'

The regular expression to determine whether a string is of such a form is trivial:

^(?:"(?:[^"]|"")*"|'(?:[^']|'')*')(?:\s+[^"'\s]+\s+(?:"(?:[^"]|"")*"|'(?:[^']|'')*')*

After running the above expression to test whether it is of such a form, I need another regular expression to get each delimited string from the input. I plan to do this as follows:

Pattern pattern = Pattern.compile("What REGEX goes here?");
Matcher matcher = pattern.matcher(inputString);
int startIndex = 0;
while (matcher.find(startIndex))
{
    String quote        = matcher.group(1);
    String quotedString = matcher.group(2);
    ...
    startIndex = matcher.end();
}

I would like a regular expression that captures the quote character in group #1, and the text within quotes in group #2 (I am using Java Regex). So, for the above input, I am looking for a regular expression that produces the following output within each loop iteration:

Loop 1: matcher.group(1) = "
        matcher.group(2) = Some String #1
Loop 2: matcher.group(1) = '
        matcher.group(2) = Some String #2
Loop 3: matcher.group(1) = "
        matcher.group(2) = Some 'String' #3
Loop 4: matcher.group(1) = '
        matcher.group(2) = Some "String" #4
Loop 5: matcher.group(1) = "
        matcher.group(2) = Some ""String"" #5
Loop 6: matcher.group(1) = '
        matcher.group(2) = Some ''String'' #6

Patterns I have tried thus far (un-escaped, followed by escaped for Java code):

(["'])((?:[^\1]|\1\1)*)\1
"([\"'])((?:[^\\1]|\\1\\1)*)\\1"

(?<quot>")(?<val>(?:[^"]|"")*)"|(?<quot>')(?<val>(?:[^']|'')*)'
"(?<quot>\")(?<val>(?:[^\"]|\"\")*)\"|(?<quot>')(?<val>(?:[^']|'')*)'"

Both of these fail when trying to compile the pattern.

Is such a regular expression possible?

like image 330
Jeff G Avatar asked Dec 22 '15 23:12

Jeff G


People also ask

How do you match double quotes in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.

How do I find a character in a string in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

How do you include a quote in regex?

Try putting a backslash ( \ ) followed by &quot; .

How do you escape a double quote?

If you need to use the double quote inside the string, you can use the backslash character. Notice how the backslash in the second line is used to escape the double quote characters. And the single quote can be used without a backslash.


1 Answers

Make a utility class that matches for you:

class test {
    private static Pattern pd = Pattern.compile("(\")((?:[^\"]|\"\")*)\"");
    private static Pattern ps = Pattern.compile("(')((?:[^']|'')*)'");
    public static Matcher match(String s) {
        Matcher md = pd.matcher(s);
        if (md.matches()) return md;
        else return ps.matcher(s);
    }
}
like image 129
Czipperz Avatar answered Sep 29 '22 16:09

Czipperz