Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit number of character of capturing group

Tags:

regex

Let's say i have this text : "AAAA1 AAA11 AA111AA A1111 AAAAA AAAA1111".

I want to find all occurrences matching these 3 criteria :
-Capital letter 1 to 4 times
-Digit 1 to 4 times
-Max number of characters to be 5

so the matches would be :
{"AAAA1", "AAA11", "AA111", "A1111", "AAAA1"}

i tried

([A-Z]{1,4}[0-9]{1,4}){5}

but i knew it would fail, since it's looking for five time my group.

Is there a way to limit result of the groups to 5 characters?

Thanks

like image 201
sabatmonk Avatar asked Mar 26 '15 16:03

sabatmonk


People also ask

How do I limit characters in regex?

The ‹ ^ › and ‹ $ › anchors ensure that the regex matches the entire subject string; otherwise, it could match 10 characters within longer text. The ‹ [A-Z] › character class matches any single uppercase character from A to Z, and the interval quantifier ‹ {1,10} › repeats the character class from 1 to 10 times.

How do Capturing groups work in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

What is capturing group in regex Javascript?

Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.

What is first capturing group in regex?

First group matches abc. Escaped parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex.


1 Answers

You can limit the character count with a look ahead while checking the pattern with you matching part.

If you can split the input by whitespace you can use:

^(?=.{2,5}$)[A-Z]{1,4}[0-9]{1,4}$

See demo here.

If you cannot split by whitespace you can use capturing group with (?:^| )(?=.{2,5}(?=$| ))([A-Z]{1,4}[0-9]{1,4})(?=$| ) for example, or lookbehind or \K to do the split depending on your regex flavor (see demo).


PREVIOUS ANSWER, wrongly matches A1A1A, updated after @a_guest remark.

You can use a lookahead to check for your pattern, while limiting the character count with the matching part of the regex:

(?=[A-Z]{1,4}[0-9]{1,4}).{2,5}

See demo here.

like image 106
Robin Avatar answered Oct 06 '22 04:10

Robin