Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex match text in either single or double quote

Tags:

regex

I want to match strings like:

The sentence is 'He said "Hello there"'
The sentence is "He said 'Hello there'"

and get back a single capture (match) that is the sentence inside the outer single or double quotes.

^The sentence is (?:(?:'([^']*)')|(?:"([^"]*)"))$

The above regex gives me back 2 captured groups, one of them empty and the other containing the desired sentence.

^The sentence is (['"])(.*)\1$

Returns the quotation mark (single or double quote) as the 1st group and the sentence as the 2nd group.

If I make the first group non-capturing,

^The sentence is (?:['"])(.*)\1$

then I cannot use the later reference to the captured group. (the \1 is, of course, no longer referring to the single or double quote match)

Is there a way to have groups whose "capture" can be referenced later in the regex, but whose captured value is not returned in the list of matches?

Or some other way to solve my (seemingly simple) problem.

like image 968
Phil Davis Avatar asked Oct 27 '17 04:10

Phil Davis


People also ask

How do you match double quotes in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.

How do you include a quote in regex?

Try putting a backslash ( \ ) followed by " .

How do you match expressions in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

How to match all quotes in a string with regex?

Matching all outer single quotes in a given string: The regex will only match 'foo + hi' & ignore the inner quote in 'baz (minus the words of course) This is quite handy for trying to fix malformed JSON that contains single ' vs the required double " Matching all outer double quotes in a given string:

How to match all outer double quotes in a string?

Matching all outer double quotes in a given string: The regex will only match "with + quotes" & ignore the inner quote in stuff " in.

How to escape a quote symbol with a backslash in regex?

Also, the regex should allow for escaping a quote symbol with a backslash if it's the same symbol (double or single quote symbol) bounding the string. Try this: " (?: [^"\]|\.)*"|' (?: [^'\]|\.)*'

How many Escaped quotes can you allow in a regex?

Following is the response I gave, slightly updated to improve clarity: First, to ensure we're on the same page, here are some examples of the kinds of quoted strings the regex will correctly match: In other words, it allows any number of escaped quotes of the same type as the enclosure.


4 Answers

Very sad, but such an elegant and accurate way does not work:

(["'])(?:\\\1|[^\1]+)*\1

But we can change it a little bit, and all works fine:

(["'])((?:\\\1|(?:(?!\1)).)*)(\1)

https://regex101.com/r/dKdBMT/2

I would like to make sure that this regexp will work in all cases: please more test it.

like image 65
redisko Avatar answered Oct 21 '22 03:10

redisko


You want to make sure the quote symbols are properly matched, so a quote starting with a single quote ends with a single quote. Also, the regex should allow for escaping a quote symbol with a backslash if it's the same symbol (double or single quote symbol) bounding the string. Try this:

"(?:[^"\\]|\\.)*"|'(?:[^'\\]|\\.)*'

These samples match this regex:

'sing"le q\'uote'

"dou\"ble 'quote"

like image 36
tgoneil Avatar answered Oct 21 '22 03:10

tgoneil


This one seems to work:

(?:'|").*(?:'|")

or

((?:'|").*(?:'|"))

if you need a group.

Here's the demo: link

It works, because * is a greedy quantifier, so you don't have to know what kind of quote is in the end. * will take as much as possible.

like image 12
Egan Wolf Avatar answered Oct 21 '22 04:10

Egan Wolf


One of above is very accurate. But, needs some updates. Here it is:

(["'])((?:\\1|(?:(?!\1)).)*)(\1)

This will match everything as string literals.

like image 1
Yogesh Sonawane Avatar answered Oct 21 '22 04:10

Yogesh Sonawane