Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex, ignoring pattern if it's in quotes

Writing a very simple script parser as part of a school project, and while it's not required I'm curious if it can be done with only a regular expression.

The syntax is similar to ASP, where a script begins with <% and ends with %>.

It only supports one command "pr", which is the same as echo or Response.Write.

Right now I'm using this regular expression to find script blocks:

(<%\s*([\s\S]*?)\s*%>)

But if I have a command like this:

<% pr "%>"; %>

...it obviously only matches:

<% pr "%>

Is there a way using pure regex to to ignore closing tags that are within quotes? My main worry is that it might match tags that are between quotes, but actually outside of them, if that makes sense. For example...

<% pr "hello world"; %> "

Technically the closing tag is surrounded by quotes, but it's not inside an "open" then "close" quote, rather the other way around.

If this is possible with regex that would be pretty neat, otherwise I suspect that if I wanted to support this functionality I would have to manually iterate through the incoming text and parse the blocks out myself, which is no big deal really either.

Thanks!

like image 853
ARW Avatar asked Dec 12 '12 15:12

ARW


People also ask

Do I need to escape quotes in regex?

In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.

Can you use quotes in regex?

Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.

What is quotation mark in regex?

the regex looks for a quote mark " then it looks for any possible group of letters thats not " until it finds icon. and any possible group of letters that is not " it then looks for a closing "


1 Answers

I think this one should suit your needs: <%(".*?"|.*?)*?%> (see the Demo).

Explanation:

While .* matches as long as possible, .*? matches as little as possible.

For example (using pseudo-code),

"#foo# #bar#".matches(/#(.*)#/).group(1) // will return ["foo# #bar"]

while

"#foo# #bar#".matches(/#(.*?)#/).group(1) // will return ["foo", "bar"]
like image 161
sp00m Avatar answered Nov 15 '22 15:11

sp00m