I want to construct a regex, that matches either '
or "
and then matches other characters, ending when a '
or an "
respectively is matched, depending on what was encountered right at the start. So this problem appears simple enough to solve with the use of a backreference at the end; here is some regex code below (it's in Java so mind the extra escape chars such as the \
before the "
):
private static String seekerTwo = "(['\"])([a-zA-Z])([a-zA-Z0-9():;/`\\=\\.\\,\\- ]+)(\\1)";
This code will successfully deal with things such as:
"hello my name is bob"
'i live in bethnal green'
The trouble comes when I have a String like this:
"hello this seat 'may be taken' already"
Using the above regex on it will fail on the initial part upon encountering '
then it would continue and successfully match 'may be taken'
... but this is obviously insufficient, I need the whole String to be matched.
What I'm thinking, is that I need a way to ignore the type of quotation mark, which was NOT matched in the very first group, by including it as a character in the character set of the 3rd group. However, I know of no way to do this. Is there some sort of sneaky NOT backreference function or something? Something I can use to reference the character in the 1st group that was NOT matched?? Or otherwise some kind of solution to my predicament?
This can be done using negative lookahead assertions. The following solution even takes into account that you could escape a quote inside a string:
(["'])(?:\\.|(?!\1).)*\1
Explanation:
(["']) # Match and remember a quote.
(?: # Either match...
\\. # an escaped character
| # or
(?!\1) # (unless that character is identical to the quote character in \1)
. # any character
)* # any number of times.
\1 # Match the corresponding quote.
This correctly matches "hello this seat 'may be taken' already"
or "hello this seat \"may be taken\" already"
.
In Java, with all the backslashes:
Pattern regex = Pattern.compile(
"([\"']) # Match and remember a quote.\n" +
"(?: # Either match...\n" +
" \\\\. # an escaped character\n" +
"| # or\n" +
" (?!\\1) # (unless that character is identical to the matched quote char)\n" +
" . # any character\n" +
")* # any number of times.\n" +
"\\1 # Match the corresponding quote",
Pattern.COMMENTS);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With