Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: capture paired curly braces

Tags:

regex

php

I want to capture matched curly braces.

For example:

Some example text with \added[author]{text with curly braces{some text}..}

Some example text with \added[author]{text without curly braces}

Some example text with \added[author]{text with {}and {} and {}curly braces{some text}..}

Some example text with \added[author]{text with {}and {} and {}curly braces{some text}..} and extented text with curly braces {}

Expected output:

Some example text with text with curly braces{some text}..

Some example text with text without curly braces

Some example text with text with {}and {} and {}curly braces{some text}..

Some example text with text with {}and {} and {}curly braces{some text}.. and extented text with curly braces {}

i.e. I want to capture the text between \added[]{ and }(its relative closing curly braces).Problem with my regex is, I don't know how to capture between the related curly braces.

I tried,

       "/\\\\added\\[.*?\\]{(.[^{]*?)}/s"

I know it ignores if { present in the text. But I don't get an idea how to create a regex to get matched curly braces alone.

like image 765
Learning Avatar asked Sep 01 '15 10:09

Learning


People also ask

What is the purpose of the curly brackets {} in regular expression?

The curly brackets are used to match exactly n instances of the proceeding character or pattern. For example, "/x{2}/" matches "xx".

How do I use curly brackets in regex Python?

"Curly braces can be used to represent the number of repetitions between two numbers. The regex {x,y} means "between x and y repetitions of something". Hence {0,1} is the same thing as ?. If the first number is missing, it is taken to be zero.

Why curly braces are not used in Python?

One of the biggest differences between Python and other popular programming languages is that in Python, curly braces are not used to create program blocks for flow control. In Python, indentation is used for flow control, which makes Python much easier to read than most other programming languages.


2 Answers

To match paired braces you'll want to use a recursive subpattern.


Example:

$regex = <<<'REGEX'
/
\\added\[.*?\]                # Initial \added[author]

(                             # Group to be recursed on.
    {                         # Opening brace.

    (                         # Group for use in replacement.

        ((?>[^{}]+)|(?1))*    # Any number of substrings which can be either:
                              # - a sequence of non-braces, or
                              # - a recursive match on the first capturing group.
    )

    }                         # Closing brace.
)
/xs
REGEX;

$strings = [
    'Some example text with \added[author]{text with curly braces{some text}..}',
    'Some example text with \added[author]{text without curly braces}',
    'Some example text with \added[author]{text with {}and {} and {}curly braces{some text}..}',
    'Some example text with \added[author]{text with {}and {} and {}curly braces{some text}..} and extented text with curly braces {}'
];

foreach ($strings as $string) {
    echo preg_replace($regex, '$2', $string), "\n";
}

Output:

Some example text with text with curly braces{some text}..
Some example text with text without curly braces
Some example text with text with {}and {} and {}curly braces{some text}..
Some example text with text with {}and {} and {}curly braces{some text}.. and extented text with curly braces {}
like image 64
user3942918 Avatar answered Oct 06 '22 01:10

user3942918


Here, should work

/\\added\[.*\]\{(.*(?:.*\{.*\}.*)*)\}/gU

Explanation

/\\added\ is a Latex tag,

\[.*\] is an option of Latex tag,

\{ open bracket,

(.*(?:.*\{.*\}.*)*) is captured text which here we also prevent for recursive {...} or multiple {...} inside our target tag,

\} close bracket.

Strategy

I do not consider pair of bracket as a recursive form

{ { {...} } }
c b a   a b c

where we have pair a, b and c,

but I consider them like this!

{ { {...} } }   
a b c   a b c

see: DEMO

The last two examples in my demo also prove that it work correctly.

IMPORTANT: the modifier U suppose to be used here for a purpose of non-greedy quantifier otherwise my regex will not work correctly.

like image 40
fronthem Avatar answered Oct 05 '22 23:10

fronthem