I have some ordered test results encoded in a character string. The string can be of arbitrary length. Each digit in the string represents a test result. In the following, for example, there are four test results represented:
2069
I want to tidy these up in R by splitting the string into individual observations. No problem with strsplit or string::str_split, which returns four values that will become my observations.
strsplit("2069" %>% as.character(), split = "") %>% unlist()
[1] "2" "0" "6" "9"
Now, however, I have realized that some results are values greater than 9. These two-digit values have been encoded with parentheses to make clear they are not individual results.
For example, in the following case I still have four values, but some have been enclosed in parentheses to group the values larger than 9.
2(10)1(12)
I'm struggling with a way to break these up so that I get
[1] "2" "10" "1" "12"
Appreciate any guidance. Thanks.
Updated - pattern match based on the OP's new pattern showed in the comments. Here, we use str_extract to extract one or more digits that follow an open parentheses (regex lookaround ) or (|) any character that is not a parentheses ([^()])
library(stringr)
str_extract_all(str1, "(?<=[(])\\d+|[^()]")
[[1]]
[1] "2" "10" "1" "12"
[[2]]
[1] "2" "0" "6" "9"
[[3]]
[1] "2" "15"
[[4]]
[1] "2" "1" "3" "1"
-testing on the OP's extra pattern
str_extract_all(str2, "(?<=[(])\\d+|[^()]")
[[1]]
[1] "2" "10" "1" "12"
[[2]]
[1] "2" "0" "6" "9"
[[3]]
[1] "2" "15"
[[4]]
[1] "2" "1" "3" "1"
[[5]]
[1] "10" "0" "2" "0" "1"
-Earlier solutions (Based on the assumption that all the numbers that are greater than 9 will be wrapped inside the parentheses)
We may split on the parentheses in base R
unlist(strsplit(str1[1], "\\(|\\)"))
[1] "2" "10" "1" "12"
Assuming if there are both cases, then an option is to get the index of those elements have the parentheses and do this separately
i1 <- grepl("\\(|\\)", str1)
lst1 <- vector('list', length(str1))
lst1[i1] <- strsplit(str1[i1], "\\(|\\)")
lst1[!i1] <- strsplit(str1[!i1], "")
unlist(lst1)
[1] "2" "10" "1" "12" "2" "0" "6" "9" "2" "15" "2" "1" "3" "1"
or another option is ifelse with grepl to create a single delimiter and then use strsplit
lst1 <- strsplit(trimws(ifelse(grepl("\\(|\\)", str1),
gsub("\\(|\\)", ",", str1), gsub("(?<=.)(?=.)", "\\1,\\2",
str1, perl = TRUE)), whitespace = ","), ",")
lst1
[[1]]
[1] "2" "10" "1" "12"
[[2]]
[1] "2" "0" "6" "9"
[[3]]
[1] "2" "15"
[[4]]
[1] "2" "1" "3" "1"
str1 <- c("2(10)1(12)", "2069", "2(15)", "2131")
str2 <- c(str1, "(10)0201")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With