Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to pick characters outside of pair of quotes

Tags:

regex

I would like to find a regex that will pick out all commas that fall outside quote sets.

For example:

'foo' => 'bar', 'foofoo' => 'bar,bar' 

This would pick out the single comma on line 1, after 'bar',

I don't really care about single vs double quotes.

Has anyone got any thoughts? I feel like this should be possible with readaheads, but my regex fu is too weak.

like image 954
SocialCensus Avatar asked Mar 10 '09 22:03

SocialCensus


People also ask

Do I need to escape quotes in regex?

In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.

How do you match double quotes in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.

How do you match a character except one regex?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).

How do you match periods in regex?

The period (.) represents the wildcard character. Any character (except for the newline character) will be matched by a period in a regular expression; when you literally want a period in a regular expression you need to precede it with a backslash.


2 Answers

This will match any string up to and including the first non-quoted ",". Is that what you are wanting?

/^([^"]|"[^"]*")*?(,)/ 

If you want all of them (and as a counter-example to the guy who said it wasn't possible) you could write:

/(,)(?=(?:[^"]|"[^"]*")*$)/ 

which will match all of them. Thus

'test, a "comma,", bob, ",sam,",here'.gsub(/(,)(?=(?:[^"]|"[^"]*")*$)/,';') 

replaces all the commas not inside quotes with semicolons, and produces:

'test; a "comma,"; bob; ",sam,";here' 

If you need it to work across line breaks just add the m (multiline) flag.

like image 126
MarkusQ Avatar answered Oct 07 '22 08:10

MarkusQ


The below regexes would match all the comma's which are present outside the double quotes,

,(?=(?:[^"]*"[^"]*")*[^"]*$) 

DEMO

OR(PCRE only)

"[^"]*"(*SKIP)(*F)|, 

"[^"]*" matches all the double quoted block. That is, in this buz,"bar,foo" input, this regex would match "bar,foo" only. Now the following (*SKIP)(*F) makes the match to fail. Then it moves on to the pattern which was next to | symbol and tries to match characters from the remaining string. That is, in our output , next to pattern | will match only the comma which was just after to buz . Note that this won't match the comma which was present inside double quotes, because we already make the double quoted part to skip.

DEMO


The below regex would match all the comma's which are present inside the double quotes,

,(?!(?:[^"]*"[^"]*")*[^"]*$) 

DEMO

like image 35
Avinash Raj Avatar answered Oct 07 '22 08:10

Avinash Raj