Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Have trouble understanding capturing groups and back references

Tags:

regex

Wishing to put some order into my knowledge of regular expressions I decided to go through a book about them, Introducing Regular Expressions. And I know it's silly but one of the introductory examples doesn't make sense to me.

(\d)\d\1

Sample text:

123-456-7890

(should capture the first number, 123)

Can anyone explain what is going on in here?

As far as I can figure out, the first \d captures the number 123. The \1 backreferences (marks) the group for later use. The parenthesis limit the scope of the group. But what does the second \d does?

Simple explanation, like to a small child or a golden retriever are prefered.

like image 443
Rook Avatar asked Feb 19 '14 12:02

Rook


2 Answers

\d is just one digit.

This regular expression doesn't match the "123-456-7890" string but it would match "323" (which could be part of a greater string, for example "323-456-7890") :

 (\d) : first digit ("3")
 \d   : another digit ("2")
 \1   : first group (which was "3")

Now, if your book pretends that (\d)\d\1 should capture "123" in "123-456-7890", then it might contain an error...

like image 118
Denys Séguret Avatar answered Oct 04 '22 20:10

Denys Séguret


(\d)\d\1 step by step:

  1. The first \d matches one digit
  2. And the parentheses () mark this as a capturing group - this is the first one, so the digit is remembered as "group 1"
  3. The second \d says there is another digit
  4. \1 says "here is the value from our previous group 1" - that is the digit that was matched in step 1.

So like dystroy already said: the regex should match a sequence of three digits of which the first and the third are equal.

like image 27
piet.t Avatar answered Oct 04 '22 20:10

piet.t