Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: parse nested parentheses

Tags:

regex

r

I would like to parse nested parentheses using R. No, this is not JASON. I have seen examples using perl, php, and python, but I am having trouble getting anything to work in R. Here is an example of some data:

(a(a(a)(aa(a)a)a)a)((b(b)b)b)(((cc)c)c)

I would like to split this string based on the three parent parentheses into three separate strings:

(a(a(a)(aa(a)a)a)a)

((b(b)b)b)

(((cc)c)c)

One of the challenges I am facing is the lack of a consistent structure in terms of total pairs of child parentheses within the parent parentheses, and the number of consecutive open or closed parentheses. Notice the consecutive open parentheses in the data with Bs and with Cs. This has made attempts to use regex very difficult. Also, the data within a given parent parentheses will have many common characters to other parent parentheses, so looking for all "a"s or "b"s is not possible - I fabricated this data to help people see the three parent parentheses better.

Basically I am looking for a function that identifies parent parentheses. In other words, a function that can find parentheses that are not contained with parentheses, and return all instances of this for a given string.

Any ideas? I appreciate the help.

like image 844
Phil_T Avatar asked Oct 25 '25 05:10

Phil_T


1 Answers

Here is one directly adapted from Regex Recursion with \\((?>[^()]|(?R))*\\):

s = "(a(a(a)(aa(a)a)a)a)((b(b)b)b)(((cc)c)c)"
matched <- gregexpr("\\((?>[^()]|(?R))*\\)", s, perl = T)
substring(s, matched[[1]], matched[[1]] + attr(matched[[1]], "match.length") - 1)
# [1] "(a(a(a)(aa(a)a)a)a)" "((b(b)b)b)"          "(((cc)c)c)"   
like image 190
Psidom Avatar answered Oct 27 '25 18:10

Psidom