Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for matching quotes and single quotes

I'm currently writing a parser for ColdFusion code. I'm using a regex (in c#) to extract the name datasource attribute of the cfquery tag.

For the time being the regex is the following <cfquery\s.*datasource\s*=\s*(?:'|")(.*)(?:'|")

it works well for strings like <cfquery datasource="myDS" or <cfquery datasource='myDS'

But it gets crazy when parsing strings like <cfquery datasource="#GetSourceName('myDS')#"

Obviously the part of the regex (?:'|") is the cause. Is there a way to only match single quote when the first match was a single quote? And only match the double quote when the first match was a double quote?

Thanks in advance!

like image 714
E. Jaep Avatar asked Jun 15 '11 20:06

E. Jaep


People also ask

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.

How do you add single quotes in regex?

/^[A-Za-z\/\s\.

How do you match double quotes in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.


2 Answers

Edit: I think this should work in C# you just need to do a back reference:

datasource\s*=\s*('|")(.*)(?:\1)

or perhaps

datasource\s*=\s*('|")(.*)(?:$1)

matches datasource="#GetSourceName('myDS')#" with a back reference to the first match with \1.

Of course, you cannot ignore the first capture group with ?: and still have this work. Also, you may want to set the lazy flag so as not to match additional "'s

like image 183
NullRef Avatar answered Oct 23 '22 23:10

NullRef


I would suggest using two different regexes if possible, or splitting the regex in a different way.

For a single regex, considering the question @Mike posted, ("[^"]*")|('[^']*') Then you can parse out the quotes.

The other potential way of doing this is by using lookahead/lookbehind, but that tends to get messy and isn't universally supported.

like image 26
Greg Jackson Avatar answered Oct 23 '22 23:10

Greg Jackson