Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

It is possible to match a character repetition with regex? How?

Tags:

python

regex

Question:
Is is possible, with regex, to match a word that contains the same character in different positions?

Condition:
All words have the same length, you know the character positions (example the 1st, the 2nd and the 4th) of the repeated char, but you don't know what is it.

Examples:
using lowercase 6char words I'd like to match words where the 3rd and the 4th chars are the same.

parrot <- match for double r
follia <- match for double l 
carrot <- match for double r
mattia <- match for double t
rettoo <- match for double t
melone <- doesn't match

I can't use the quantifier [\d]{2} because it match any succession of two chars, and what if I say the 2nd and the 4th position instead of 3rd and 4th?

Is it possible to do what I want with regex? If yes, how can I do that?

EDIT:
Ask asked in the comments, I'm using python

like image 808
Andrea Ambu Avatar asked Jun 21 '09 13:06

Andrea Ambu


People also ask

How do you match a character sequence in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .

How do you repeat in regex?

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.

How does regex matching work?

A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.


1 Answers

You can use a backreference to do this:

(.)\1

This will match consecutive occurrences of any character.


Edit   Here’s some Python example:

import re

regexp = re.compile(r"(.)\1")
data = ["parrot","follia","carrot","mattia","rettoo","melone"]

for str in data:
    match = re.search(regexp, str)
    if match:
        print str, "<- match for double", match.group(1)
    else:
        print str, "<- doesn't match"
like image 141
Gumbo Avatar answered Oct 13 '22 07:10

Gumbo