Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching items in a comma-delimited list which aren't surrounded by single or double quotes

I'm wanting to match any instance of text in a comma-delimited list. For this, the following regular expression works great:

/[^,]+/g

(Regex101 demo).

The problem is that I'm wanting to ignore any commas which are contained within either single or double quotes and I'm unsure how to extend the above selector to allow me to do that.

Here's an example string:

abcd, efgh, ij"k,l", mnop, 'q,rs't

I'm wanting to either match the five chunks of text or match the four relevant commas (so I can retreive the data using split() instead of match()):

  1. abcd
  2. efgh
  3. ij"k,l"
  4. mnop
  5. 'q,rs't

Or:

abcd, efgh, ij"k,l", mnop, 'q,rs't
    ^     ^        ^     ^

How can I do this?


Three relevant questions exist, but none of them cater for both ' and " in JavaScript:

  1. Regex for splitting a string using space when not surrounded by single or double quotes - Java solution, doesn't appear to work in JavaScript.
  2. A regex to match a comma that isn't surrounded by quotes - Only matches on "
  3. Alternative to regex: match all instances not inside quotes - Only matches on "
like image 786
James Donnelly Avatar asked Mar 14 '16 14:03

James Donnelly


2 Answers

Okay, so your matching groups can contain:

  • Just letters
  • A matching pair of "
  • A matching pair of '

So this should work:

/((?:[^,"']+|"[^"]*"|'[^']*')+)/g

RegEx101 Demo

As a nice bonus, you can drop extra single-quotes inside the double-quotes, and vice versa. However, you'll probably need a state machine for adding escaped double-quotes inside double quoted strings (eg. "aa\"aa").

Unfortunately it matches the initial space as well - you'll have to the trim the matches.

like image 108
Gustav Bertram Avatar answered Nov 01 '22 03:11

Gustav Bertram


Using a double lookahead to ascertain matched comma is outside quotes:

/(?=(([^"]*"){2})*[^"]*$)(?=(([^']*'){2})*[^']*$)\s*,\s*/g
  • (?=(([^"]*"){2})*[^"]*$) asserts that there are even number of double quotes ahead of matching comma.
  • (?=(([^']*"){2})*[^']*$) does the same assertion for single quote.

PS: This doesn't handle case of unbalanced, nested or escaped quotes.

RegEx Demo

like image 2
anubhava Avatar answered Nov 01 '22 03:11

anubhava