Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

repeating multiple characters regex

Tags:

Is there a way using a regex to match a repeating set of characters? For example:

ABCABCABCABCABC

ABC{5}

I know that's wrong. But is there anything to match that effect?

Update:

Can you use nested capture groups? So Something like (?<cap>(ABC){5}) ?

like image 598
Falmarri Avatar asked Sep 02 '10 20:09

Falmarri


People also ask

How do you repeat a pattern in regex?

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.

What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.

What does * do in regex?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.

What is a capturing group regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .


2 Answers

Enclose the regex you want to repeat in parentheses. For instance, if you want 5 repetitions of ABC:

(ABC){5} 

Or if you want any number of repetitions (0 or more):

(ABC)* 

Or one or more repetitions:

(ABC)+ 

edit to respond to update

Parentheses in regular expressions do two things; they group together a sequence of items in a regular expression, so that you can apply an operator to an entire sequence instead of just the last item, and they capture the contents of that group so you can extract the substring that was matched by that subexpression in the regex.

You can nest parentheses; they are counted from the first opening paren. For instance:

>>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(0) '123 ABCDEF' >>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(1) 'ABCDEF' >>> re.search('[0-9]* (ABC(...))', '123 ABCDEF 456').group(2) 'DEF' 

If you would like to avoid capturing when you are grouping, you can use (?:. This can be helpful if you don't want parentheses that you're just using to group together a sequence for the purpose of applying an operator to change the numbering of your matches. It is also faster.

>>> re.search('[0-9]* (?:ABC(...))', '123 ABCDEF 456').group(1) 'DEF' 

So to answer your update, yes, you can use nested capture groups, or even avoid capturing with the inner group at all:

>>> re.search('((?:ABC){5})(DEF)', 'ABCABCABCABCABCDEF').group(1) 'ABCABCABCABCABC' >>> re.search('((?:ABC){5})(DEF)', 'ABCABCABCABCABCDEF').group(2) 'DEF' 
like image 59
Brian Campbell Avatar answered Oct 03 '22 06:10

Brian Campbell


ABC{5} matches ABCCCCC. To match 5 ABC's, you should use (ABC){5}. Parentheses are used to group a set of characters. You can also set an interval for occurrences like (ABC){3,5} which matches ABCABCABC, ABCABCABCABC, and ABCABCABCABCABC.

(ABC){1,} means 1 or more repetition which is exactly the same as (ABC)+.

(ABC){0,} means 0 or more repetition which is exactly the same as (ABC)*.

like image 44
Zafer Avatar answered Oct 03 '22 07:10

Zafer