I have a 3 rows by 1 column vector with values: "S0027A-E", "S0028A-D", "S0029A-C"; hence:
input_string <- as.vector(c("S0027A-E", "S0028A-D", "S0029A-C"))
The output parsed strings must be a list separated by comma corresponding to each value in the input vector such that:
input_string | parsed_strings |
---|---|
"S0027A-E" | "S0027A", "S0027B", "S0027C", "S0027D", "S0027E" |
"S0028A-D" | "S0028A", "S0028B", "S0028C", "S0028D" |
"S0029A-C" | "S0029A", "S0029B", "S0029C" |
I have already initially created the parsing script but the output is incorrectly a vector of 1 row with 12 elements: "S0027A" "S0027B" "S0027C" "S0027D" "S0027E" "S0028A" "S0028B" "S0028C" "S0028D" "S0029A" "S0029B" "S0029C"- all in one row instead of the output shown in the table.
# length of input_string
len_string = length(input_string)
# Extract the prefix, start, and end letters
library(stringr)
parsed_strings <- as.character()
for (i in 1:len_string){
prefix <- str_extract(input_string[[i]][1], "^[A-Z]\\d{4}")
range_part <- str_extract(input_string[[i]][1], "[A-Z]-[A-Z]$")
start_letter <- substr(range_part, 1, 1)
end_letter <- substr(range_part, 3, 3)
output <- paste0(prefix, LETTERS[match(start_letter, LETTERS):match(end_letter, LETTERS)])
parsed_strings <- c(parsed_strings, output)
}
The output must be as shown in the table so greatly appreciate any advise to rectify my code. Thanks in advance!
You can try
lapply(
strsplit(input_string, split = "(?<=\\d)(?=\\D)|-", perl = TRUE),
\(x) {
paste0(x[1], LETTERS[LETTERS >= x[2] & LETTERS <= x[3]])
}
)
which gives
[[1]]
[1] "S0027A" "S0027B" "S0027C" "S0027D" "S0027E"
[[2]]
[1] "S0028A" "S0028B" "S0028C" "S0028D"
[[3]]
[1] "S0029A" "S0029B" "S0029C"
If you want the output presented in a data frame, you can try
within(
data.frame(input_string),
parsed_strings <- lapply(
strsplit(input_string, split = "(?<=\\d)(?=\\D)|-", perl = TRUE),
\(x) {
paste0(x[1], LETTERS[LETTERS >= x[2] & LETTERS <= x[3]])
}
)
)
which shows
input_string parsed_strings
1 S0027A-E S0027A, S0027B, S0027C, S0027D, S0027E
2 S0028A-D S0028A, S0028B, S0028C, S0028D
3 S0029A-C S0029A, S0029B, S0029C
A data.table
approach
Prerequisites, a function seqChar()
to get a sequence of letters from start and end character and the prefix string prefix.
input_string <- c("S0027A-E", "S0028A-D", "S0029A-C") # Data
seqChar <- function(a, b) { lett <- LETTERS
lett[which(lett == a):which(lett == b)] }
prefix <- sub("(.*)\\D-\\D$", "\\1", input_string)
library(data.table)
data.table(input_string)[, .(parsed_strings =
lapply(strsplit(sub(".*(\\D)-", "\\1", input_string), ""), \(x)
paste0(prefix, seqChar(x[1], x[2])))), by = input_string]
output
input_string parsed_strings
<char> <list>
1: S0027A-E S0027A,S0028B,S0029C,S0027D,S0028E
2: S0028A-D S0027A,S0028B,S0029C,S0027D
3: S0029A-C S0027A,S0028B,S0029C
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With