Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for repeating sequence

Tags:

python

regex

I'd like to match three-character sequences of letters (only letters 'a', 'b', 'c' are allowed) separated by comma (last group is not ended with comma).

Examples:

abc,bca,cbb
ccc,abc,aab,baa
bcb

I have written following regular expression:

re.match('([abc][abc][abc],)+', "abc,defx,df")

However it doesn't work correctly, because for above example:

>>> print bool(re.match('([abc][abc][abc],)+', "abc,defx,df")) # defx in second group
True
>>> print bool(re.match('([abc][abc][abc],)+', "axc,defx,df")) # 'x' in first group
False

It seems only to check first group of three letters but it ignores the rest. How to write this regular expression correctly?

like image 308
scdmb Avatar asked Dec 15 '11 07:12

scdmb


Video Answer


2 Answers

Try following regex:

^[abc]{3}(,[abc]{3})*$

^...$ from the start till the end of the string
[...] one of the given character
...{3} three time of the phrase before
(...)* 0 till n times of the characters in the brackets

like image 99
scessor Avatar answered Sep 30 '22 18:09

scessor


What you're asking it to find with your regex is "at least one triple of letters a, b, c" - that's what "+" gives you. Whatever follows after that doesn't really matter to the regex. You might want to include "$", which means "end of the line", to be sure that the line must all consist of allowed triples. However in the current form your regex would also demand that the last triple ends in a comma, so you should explicitly code that it's not so. Try this:

re.match('([abc][abc][abc],)*([abc][abc][abc])$'

This finds any number of allowed triples followed by a comma (maybe zero), then a triple without a comma, then the end of the line.

Edit: including the "^" (start of string) symbol is not necessary, because the match method already checks for a match only at the beginning of the string.

like image 39
Sonya Avatar answered Sep 30 '22 16:09

Sonya