I'm using Clojure, so this is in the context of Java regexes.
Here is an example string:
{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}
The important bits are the commas after each string. I'd like to be able to replace them with newline characters with Java's replaceAll method. A regex that will match any comma that is not surrounded by quotes will do.
If I'm not coming across well, please ask and I'll be happily to clarify anything.
edit: sorry for the confusion in the title. I haven't been awake very long.
String: {:a "ab, cd efg",}
<-- In this example, the comma at the end would be matched, but the ones inside the quote would not.
String: {:a 3, :b 3,}
<-- Every single comma matches.
String {:a "abcd,efg" :b "abcedg,e"}
<-- Every single comma doesn't match.
The regex:
,\s*(?=([^"]*"[^"]*")*[^"]*$)
Matches:
{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}
^ ^
^ ^
and:
{:a "ab, cd efg",}
^
^
and does not match a comma in:
{:a "abcd,efg" :b "abcedg,e"}
But when escaped quotes can appear, like so:
{:a "ab,\" cd efg",} // only the last comma should match
then a regex solution won't work.
A brief explanation of the regex:
, # match the character ','
\s* # match a whitespace character: [ \t\n\x0B\f\r] and repeat it zero or more times
(?= # start positive look ahead
( # start capture group 1
[^"]* # match any character other than '"' and repeat it zero or more times
" # match the character '"'
[^"]* # match any character other than '"' and repeat it zero or more times
" # match the character '"'
)* # end capture group 1 and repeat it zero or more times
[^"]* # match any character other than '"' and repeat it zero or more times
$ # match the end of the input
) # end positive look ahead
In other words: match any comma that has zero, or an even number of quotes ahead of it (until the end of the string).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With