I want to match strings like:
The sentence is 'He said "Hello there"'
The sentence is "He said 'Hello there'"
and get back a single capture (match) that is the sentence inside the outer single or double quotes.
^The sentence is (?:(?:'([^']*)')|(?:"([^"]*)"))$
The above regex gives me back 2 captured groups, one of them empty and the other containing the desired sentence.
^The sentence is (['"])(.*)\1$
Returns the quotation mark (single or double quote) as the 1st group and the sentence as the 2nd group.
If I make the first group non-capturing,
^The sentence is (?:['"])(.*)\1$
then I cannot use the later reference to the captured group. (the \1 is, of course, no longer referring to the single or double quote match)
Is there a way to have groups whose "capture" can be referenced later in the regex, but whose captured value is not returned in the list of matches?
Or some other way to solve my (seemingly simple) problem.
Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.
Try putting a backslash ( \ ) followed by " .
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
Matching all outer single quotes in a given string: The regex will only match 'foo + hi' & ignore the inner quote in 'baz (minus the words of course) This is quite handy for trying to fix malformed JSON that contains single ' vs the required double " Matching all outer double quotes in a given string:
Matching all outer double quotes in a given string: The regex will only match "with + quotes" & ignore the inner quote in stuff " in.
Also, the regex should allow for escaping a quote symbol with a backslash if it's the same symbol (double or single quote symbol) bounding the string. Try this: " (?: [^"\]|\.)*"|' (?: [^'\]|\.)*'
Following is the response I gave, slightly updated to improve clarity: First, to ensure we're on the same page, here are some examples of the kinds of quoted strings the regex will correctly match: In other words, it allows any number of escaped quotes of the same type as the enclosure.
Very sad, but such an elegant and accurate way does not work:
(["'])(?:\\\1|[^\1]+)*\1
But we can change it a little bit, and all works fine:
(["'])((?:\\\1|(?:(?!\1)).)*)(\1)
https://regex101.com/r/dKdBMT/2
I would like to make sure that this regexp will work in all cases: please more test it.
You want to make sure the quote symbols are properly matched, so a quote starting with a single quote ends with a single quote. Also, the regex should allow for escaping a quote symbol with a backslash if it's the same symbol (double or single quote symbol) bounding the string. Try this:
"(?:[^"\\]|\\.)*"|'(?:[^'\\]|\\.)*'
These samples match this regex:
'sing"le q\'uote'
"dou\"ble 'quote"
This one seems to work:
(?:'|").*(?:'|")
or
((?:'|").*(?:'|"))
if you need a group.
Here's the demo: link
It works, because *
is a greedy quantifier, so you don't have to know what kind of quote is in the end. *
will take as much as possible.
One of above is very accurate. But, needs some updates. Here it is:
(["'])((?:\\1|(?:(?!\1)).)*)(\1)
This will match everything as string literals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With