Given the following string of nested parentheses
a = "[[[][]]][[[][][]]]"
I am trying to find the pair of opening and closing brackets in a and label their positions with common IDs. For example, I am trying to create a vector of IDs that would look like this
b = c(1,2,3,3,4,4,2,1,5,6,7,7,8,8,9,9,6,5)
For example, here 1 and 2 in the vector b is corresponding to the pair of brackets and so on..
[[[][]]][[[][][]]]
1 1
[[[][]]][[[][][]]]
2 2
Any input in this regard is much appreciated.
It's ugly
a <- "[[[][]]][[[][][]]]"
s <- unlist(strsplit(a, ''))
i <- cumsum(s == '[') * (s == '[')
while (any(idx <- i == 0)) {
ii <- min(which(idx))
jj <- table(i[1:ii])
i[ii] <- max(as.integer(names(jj[jj < 2])))
}
i
# [1] 1 2 3 3 4 4 2 1 5 6 7 7 8 8 9 9 6 5
@rawr, no, this is ugly:
library(data.table)
d = data.table(x = strsplit(a, "")[[1]])
d[ , g := cumsum(shift(cumsum(x == "[") == cumsum(x == "]"), fill = FALSE))]
d[ , ix := d[d[ , .I[1:(.N / 2)], by = g]$V1, {
i = cumsum(x == "[")
c(i, rev(i))}, by = g]$V1]
d[ , pair := .GRP, by = .(ix, (rowid(ix) - 1) %/% 2)]
I assume speed is not an issue here, but just out of curiosity I found my data.table monstrosity to be faster on larger strings, e.g. a = paste(rep("[[[][]]][[[][][]]]", 1000), collapse = "").
all.equal(d$pair, i)
# TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With