Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JS regular expression to find a substring surrounded by double quotes

Tags:

javascript

I need to find a substring surrounded by double quotes, for example, like "test", "te\"st" or "", but not """ neither "\". To achieve this, which is the best way to go for it in the following

1) /".*"/g
2) /"[^"\\]*(?:\\[\S\s][^"\\]*)*"/g
3) /"(?:\\?[\S\s])*?"/g
4) /"([^"\\]*("|\\[\S\s]))+/g

I was asked this question yesterday during an interview, and would like to know the answer for future reference.

like image 802
Om3ga Avatar asked Mar 14 '13 07:03

Om3ga


People also ask

How do you match double quotes in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.

How do you match a space in regex?

\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.

What is question mark in regex?

A regular expression followed by a question mark (?) matches zero or one occurrences of the regular expression. Two regular expressions concatenated match an occurrence of the first followed by an occurrence of the second.


1 Answers

These expressions evaluate as follows:

Expression 1 matches:

  • An inverted comma
  • Greedily any character, including an inverted comma or a slash
  • A final inverted comma.

This would match "test" some wrong text "text", and therefore fails

Expression 2 matches:

  • An inverted comma
  • Greedily as many characters that are not either an inverted comma or a slash
  • Greedily as many sets of
    • Any chracter preceded by a slash
    • Greedily as many characters that are not either an inverted comma or a slash
  • A final inverted comma

So this collects all chracters within the inverted commas in sets, broken by slashes. It specifically excludes an inverted comma if it is preceded by a slash by including it in any subsequent sets. This will work.

Expression 3 matches:

  • An inverted comma
  • As few sets as fit of:
    • Any one character preceded by an optional slash
  • A final inverted comma

This collects all characters , optionally preceded by a slash, but not greedily. This will work

Expression 4 matches:

  • An inverted comma
  • Greedily all characters that are no either an inverted comma or a slash
  • One or more of:
    • An inverted comma or
    • A slash and any character

This will match "test"\x, and therefore fails

Conclusion:

From what I can tell, both expressions 2 and 3 will work. I may have missed something, but both will certainly work (or not as appropriate) for the examples given. So the question, then, is which is better. I'd vote for three, because it's simpler.

like image 124
Gareth Cornish Avatar answered Oct 08 '22 00:10

Gareth Cornish