Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A regex to match a comma that isn't surrounded by quotes

I'm using Clojure, so this is in the context of Java regexes.

Here is an example string:

{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}

The important bits are the commas after each string. I'd like to be able to replace them with newline characters with Java's replaceAll method. A regex that will match any comma that is not surrounded by quotes will do.

If I'm not coming across well, please ask and I'll be happily to clarify anything.

edit: sorry for the confusion in the title. I haven't been awake very long.

String: {:a "ab, cd efg",} <-- In this example, the comma at the end would be matched, but the ones inside the quote would not.

String: {:a 3, :b 3,} <-- Every single comma matches.

String {:a "abcd,efg" :b "abcedg,e"} <-- Every single comma doesn't match.

like image 268
Rayne Avatar asked Apr 23 '10 18:04

Rayne


1 Answers

The regex:

,\s*(?=([^"]*"[^"]*")*[^"]*$)

Matches:

{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}
                ^                  ^
                ^                  ^

and:

{:a "ab, cd efg",}
                ^
                ^

and does not match a comma in:

{:a "abcd,efg" :b "abcedg,e"}

But when escaped quotes can appear, like so:

{:a "ab,\" cd efg",} // only the last comma should match

then a regex solution won't work.

A brief explanation of the regex:

,            # match the character ','
\s*          # match a whitespace character: [ \t\n\x0B\f\r] and repeat it zero or more times
(?=          # start positive look ahead
  (          #   start capture group 1
    [^"]*    #     match any character other than '"' and repeat it zero or more times
    "        #     match the character '"'
    [^"]*    #     match any character other than '"' and repeat it zero or more times
    "        #     match the character '"'
  )*         #   end capture group 1 and repeat it zero or more times
  [^"]*      #   match any character other than '"' and repeat it zero or more times
  $          #   match the end of the input
)            # end positive look ahead

In other words: match any comma that has zero, or an even number of quotes ahead of it (until the end of the string).

like image 85
Bart Kiers Avatar answered Nov 05 '22 04:11

Bart Kiers