Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to capture multiple repeated groups?

Tags:

regex

I need to capture multiple groups of the same pattern. Suppose, I have the following string:

HELLO,THERE,WORLD

And I've written the following pattern

^(?:([A-Z]+),?)+$

What I want it to do is to capture every single word, so that Group 1 is : "HELLO", Group 2 is "THERE" and Group 3 is "WORLD". What my regex is actually capturing is only the last one, which is "WORLD".

I'm testing my regular expression here and I want to use it with Swift (maybe there's a way in Swift to get intermediate results somehow, so that I can use them?)

UPDATE: I don't want to use split. I just need to now how to capture all the groups that match the pattern, not only the last one.

like image 569
phbelov Avatar asked Sep 26 '22 18:09

phbelov


People also ask

How do I capture a group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

How do you repeat a pattern in regex?

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.

What is non capturing group in regex?

Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence. In this tutorial, we'll explore how to use non-capturing groups in Java Regular Expressions.

How does regex replace work?

The REGEXREPLACE( ) function uses a regular expression to find matching patterns in data, and replaces any matching values with a new string. standardizes spacing in character data by replacing one or more spaces between text characters with a single space.


2 Answers

With one group in the pattern, you can only get one exact result in that group. If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.

You have to use your language's regex implementation functions to find all matches of a pattern, then you would have to remove the anchors and the quantifier of the non-capturing group (and you could omit the non-capturing group itself as well).

Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result:

^([A-Z]+),([A-Z]+),([A-Z]+)$
like image 100
Byte Commander Avatar answered Oct 19 '22 04:10

Byte Commander


The key distinction is repeating a captured group instead of capturing a repeated group.

As you have already found out, the difference is that repeating a captured group captures only the last iteration. Capturing a repeated group captures all iterations.

In PCRE (PHP):

((?:\w+)+),?
Match 1, Group 1.    0-5      HELLO
Match 2, Group 1.    6-11     THERE
Match 3, Group 1.    12-20    BRUTALLY
Match 4, Group 1.    21-26    CRUEL
Match 5, Group 1.    27-32    WORLD

Since all captures are in Group 1, you only need $1 for substitution.

I used the following general form of this regular expression:

((?:{{RE}})+)

Example at regex101

like image 32
ssent1 Avatar answered Oct 19 '22 04:10

ssent1