R: How to parse a vector of values with three rows, then the parsed output should be a list corresponding to the row

Question

I have a 3 rows by 1 column vector with values: "S0027A-E", "S0028A-D", "S0029A-C"; hence:

input_string <- as.vector(c("S0027A-E", "S0028A-D", "S0029A-C"))

The output parsed strings must be a list separated by comma corresponding to each value in the input vector such that:

input_string	parsed_strings
"S0027A-E"	"S0027A", "S0027B", "S0027C", "S0027D", "S0027E"
"S0028A-D"	"S0028A", "S0028B", "S0028C", "S0028D"
"S0029A-C"	"S0029A", "S0029B", "S0029C"

I have already initially created the parsing script but the output is incorrectly a vector of 1 row with 12 elements: "S0027A" "S0027B" "S0027C" "S0027D" "S0027E" "S0028A" "S0028B" "S0028C" "S0028D" "S0029A" "S0029B" "S0029C"- all in one row instead of the output shown in the table.

# length of input_string
len_string = length(input_string)

# Extract the prefix, start, and end letters
library(stringr)

parsed_strings <- as.character()

for (i in 1:len_string){
  prefix <- str_extract(input_string[[i]][1], "^[A-Z]\d{4}")
  range_part <- str_extract(input_string[[i]][1], "[A-Z]-[A-Z]$")
  start_letter <- substr(range_part, 1, 1)
  end_letter <- substr(range_part, 3, 3)
  output <- paste0(prefix, LETTERS[match(start_letter, LETTERS):match(end_letter, LETTERS)])
  parsed_strings <- c(parsed_strings, output)
}

The output must be as shown in the table so greatly appreciate any advise to rectify my code. Thanks in advance!

ThomasIsCoding · Accepted Answer

You can try

lapply(
    strsplit(input_string, split = "(?<=\d)(?=\D)|-", perl = TRUE),
    \(x) {
        paste0(x[1], LETTERS[LETTERS >= x[2] & LETTERS <= x[3]])
    }
)

which gives

[[1]]
[1] "S0027A" "S0027B" "S0027C" "S0027D" "S0027E"

[[2]]
[1] "S0028A" "S0028B" "S0028C" "S0028D"

[[3]]
[1] "S0029A" "S0029B" "S0029C"

If you want the output presented in a data frame, you can try

within(
    data.frame(input_string),
    parsed_strings <- lapply(
        strsplit(input_string, split = "(?<=\d)(?=\D)|-", perl = TRUE),
        \(x) {
            paste0(x[1], LETTERS[LETTERS >= x[2] & LETTERS <= x[3]])
        }
    )
)

which shows

  input_string                         parsed_strings
1     S0027A-E S0027A, S0027B, S0027C, S0027D, S0027E
2     S0028A-D         S0028A, S0028B, S0028C, S0028D
3     S0029A-C                 S0029A, S0029B, S0029C

Andre Wildberg · Answer

A data.table approach

Prerequisites, a function seqChar() to get a sequence of letters from start and end character and the prefix string prefix.

input_string <- c("S0027A-E", "S0028A-D", "S0029A-C") # Data

seqChar <- function(a, b) { lett <- LETTERS
                            lett[which(lett == a):which(lett == b)] }

prefix <- sub("(.*)\D-\D$", "\1", input_string)

library(data.table)

data.table(input_string)[, .(parsed_strings = 
  lapply(strsplit(sub(".*(\D)-", "\1", input_string), ""), \(x) 
    paste0(prefix, seqChar(x[1], x[2])))), by = input_string]

output

   input_string                     parsed_strings
         <char>                             <list>
1:     S0027A-E S0027A,S0028B,S0029C,S0027D,S0028E
2:     S0028A-D        S0027A,S0028B,S0029C,S0027D
3:     S0029A-C               S0027A,S0028B,S0029C

R: How to parse a vector of values with three rows, then the parsed output should be a list corresponding to the row

Tags:

string

parsing

r

Wilfredo de Vera

2 Answers

ThomasIsCoding

Andre Wildberg

Recent Activity

Donate For Us

R: How to parse a vector of values with three rows, then the parsed output should be a list corresponding to the row

Tags:

string

parsing

r

Wilfredo de Vera

2 Answers

ThomasIsCoding

Andre Wildberg

Related questions

Recent Activity

Donate For Us