Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp matching a string - positive lookahead

Regexp: (?=(\d+))\w+\1 String: 456x56

Hi,

I am not getting the concept, how this regex matches "56x56" in the string "456x56".

  1. The lookaround, (?=(\d+)), captures 456 and put into \1, for (\d+)
  2. The wordcharacter, \w+, matches the whole string("456x56")
  3. \1, which is 456, should be followed by \w+
  4. After backtracking the string, it should not find a match, as there is no "456" preceded by a word character

However the regexp matches 56x56.

like image 653
Suresh Avatar asked Jan 07 '12 13:01

Suresh


People also ask

What is regex positive lookahead?

The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign. You can use any regular expression inside the lookahead (but not lookbehind, as explained below). Any valid regular expression can be used inside the lookahead.

What is matching regexp?

The regexp-match-positions function takes a regexp pattern and a text string, and it returns a match if the regexp matches (some part of) the text string, or #f if the regexp did not match the string. A successful match produces a list of index pairs.

How do you match a string to a pattern?

Put brackets ( [ ] ) in the pattern string, and inside the brackets put the lowest and highest characters in the range, separated by a hyphen ( – ). Any single character within the range makes a successful match.

How does regex lookahead work?

In this type of lookahead the regex engine searches for a particular element which may be a character or characters or a group after the item matched. If that particular element is not present then the regex declares the match as a match otherwise it simply rejects that match.


1 Answers

5) Regex engines concludes that it cannot find a match if it start searching from 4, so it skips one character and searches again. This time, it captures two digits into \1 and ends up matching 56x56

If you want to match only whole strings, use ^(?=(\d+))\w+\1$

^ matches beginning of string
$ matches end of string
like image 153
Amarghosh Avatar answered Sep 23 '22 18:09

Amarghosh