I am surprised to not easily find a similar question with an answer on SO. I would like to match everything in some functions. The idea is to remove the functions which are useless.
foo(some (content)) --> some (content)
So I am trying to match everything in the function call which can include parenthesis. Here is my PCRE regex:
(?<name>\w+)\s*\(\K
(?<e>
[^()]+
|
[^()]*
\((?&e)\)
[^()]*
)*
(?=\))
https://regex101.com/r/gfMAIM/1
Unfortunately it doesn't work and I don't really understand why.
Your Group e
pattern does not do the right job, currently, it matches parentheses with 1 depth level as you only recursed the e
pattern once. It needs to match as many (...)
substrings as there are present, and thus, the subroutine pattern needs to be inside a *
or +
quantified group, and it can even be "simplified" to (?<e>[^()]*(?:\((?&e)\)[^()]*)*)
.
Note that your Group e
pattern is equal to (?<e>[^()]+|\((?&e)\))*
. [^()]*
around \((?&e)\)
are redundant since the [^()]+
alternative will consume the chars other than (
and )
on the current depth level.
Also, you quantified the Group e
pattern making it a repeated capturing group that only keeps the text matched during the last iteration.
You may use
(?<name>\w+)\s*\(\K(?<e>[^()]*(?:\((?&e)\)[^()]*)*)(?=\))
See the regex demo
Details
(?<name>\w+)\s*\(\K
- 1+ word chars, 0+ whitespaces and (
that are omitted from the match(?<e>
- start of Group e
[^()]*
- 0+ chars other than (
and )
(?:
- start of a non-capturing group:
\(
- a (
char(?&e)
- Group e
pattern recursed\)
- a )
[^()]*
- 0+ chars other than (
and )
)*
- 0 or more repetitions)
- end of e
group(?=\))
- a )
must be immediately to the right of the current location.The following regex does the matching without taking extra steps:
(?<name>\w+)\s*(\((?<e>([^()]*+|(?2))+)\))
See live demo here
But that doesn't match following strings that contain unbalanced parentheses in a quoted string:
foo(bar = ')')
foo(bar(john = "(Doe..."))
So what you should look for is:
(?<name>\w+)\s*(\((?<e>([^()'"]*+|"(?>[^"\\]*+|\\.)*"|'(?>[^'\\]*+|\\.)*'|(?2))+)\))
See live demo here
Regex breakdown:
(?<name>\w+)\s*
Match function name and trailing spaces(
Start of a cluster
\(
Match a literal (
(?<e>
Start of named capturing group e
(
Start of capturing group #2
[^()'"]*+
Match any thing except ()'"
|
Or"(?>[^"\\]*+|\\.)*"
Match any thing between double quotes|
Or '(?>[^'\\]*+|\\.)*'
Match any thing between single quotes|
Or(?2)
Recurse second capturing group)+
Repeat as much as possible, at least once)
End of capturing group\)
Match )
literally)
End of capturing groupIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With