Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negating a backreference in Regular Expressions

if a string has this predicted format:

value = "hello and good morning" 

Where the " (quotations) might also be ' (single quote), and the closing char (' or ") will be the same as the opening one. I want to match the string between the quotation marks.

\bvalue\s*=\s*(["'])([^\1]*)\1 

(the two \s are to allow any spaces near the = sign)

The first "captured group" (inside the first pair of brackets) - should match the opening quotation which should be either ' or " then - I'm supposed to allow any number of characters that are not what was captured in the first group, and then I expect the character captured in the group (the enclosing quotation marks).

(the required string should be captured in the second capture-group).
This doesn't work though.

This does:

\bvalue\s*=\s*(['"])([^"']*)["'] 

but I want to make sure that both the opening and closing quotation mark (either double or single) are the same.


EDIT
The goal was basically to get the opening tag of an anchor that has a certain class-name included within its class attribute, and I wanted to cover the rare occasion of the class attribute including a (') or a (").

Following all of the advices here, I used the pattern:

<\s*\ba\b[^<>]+\bclass\s*=\s*("|'|\\"|\\')(?:(?!\1).)*\s*classname\s*(?:(?!\1).)*\1[^>]*> 

Meaning:
Find a tag-open sign.
Allow any spaces.
Find the word a.
Allow any non-closing-tag.
Find "class (any spaces) = (any spaces)"
Get opening quotes, one of the following: (" or ' or \" or \').
From Alan Moore's answer: Allow any characters that are not the opening quotes.
find classname
Allow any characters that are not the opening quotes.
Find the closing quote which is the same as the opening.
Allow any unclosing-tag chars.
Find the closing tag char.

like image 323
Yuval A. Avatar asked Nov 08 '11 19:11

Yuval A.


People also ask

How do you negate a regular expression?

Similarly, the negation variant of the character class is defined as "[^ ]" (with ^ within the square braces), it matches a single character which is not in the specified or set of possible characters. For example the regular expression [^abc] matches a single character except a or, b or, c.

What is Backreference regex?

A backreference in a regular expression identifies a previously matched group and looks for exactly the same text again. A simple example of the use of backreferences is when you wish to look for adjacent, repeated words in some text.

What is Backreference in regular expression Python?

Introduction to the Python regex backreferences The backreferences allow you to reference capturing groups within a regular expression. In this syntax, N can be 1, 2, 3, etc. that represents the corresponding capturing group. Note that the \g<0> refer to the entire match, which has the same value as the match.


2 Answers

Instead of a negated character class, you have to use a negative lookahead:

\bvalue\s*=\s*(["'])(?:(?!\1).)*\1 

(?:(?!\1).)* consumes one character at a time, after the lookahead has confirmed that the character is not whatever was matched by the capturing group, (["'']). A character class, negated or not, can only match one character at a time. As far as the regex engine knows, \1 could represent any number of characters, and there's no way to convince it that \1 will only contain " or ' in this case. So you have to go with the more general (and less readable) solution.

like image 192
Alan Moore Avatar answered Oct 05 '22 12:10

Alan Moore


You can use:

\bvalue\s*=\s*(['"])(.*?)\1 

See it

like image 28
codaddict Avatar answered Oct 05 '22 12:10

codaddict