Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strsplit split on either or depending on

Once again I'm struggling with strsplit. I'm transforming some strings to data frames, but there's a forward slash, / and some white space in my string that keep bugging me. I could work around it, but I eager to learn if I can use some fancy either or in strsplit. My working example below should illustrate the issue

The strsplit function I'm currrently using

str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "\\s+")[[x]])) }

one type of string I got,

string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
str_to_df(string1)
#>      [,1]    [,2]  
#> [1,] "One"   "58/2"
#> [2,] "Two"   "22/3"
#> [3,] "Three" "15/5"

another type I got in the same spot,

string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string2)
#>      [,1]    [,2] [,3] [,4]
#> [1,] "One"   "58" "/"  "2" 
#> [2,] "Two"   "22" "/"  "3" 
#> [3,] "Three" "15" "/"  "5" 

They obviously create different outputs, and I can't figure out how to code a solution that work for both. Below is my desired outcome. Thank you in advance!

desired_outcome <- structure(c("One", "Two", "Three", "58", "22",
                               "15", "2", "3", "5"), .Dim = c(3L, 3L))
desired_outcome
#>      [,1]    [,2] [,3]
#> [1,] "One"   "58" "2" 
#> [2,] "Two"   "22" "3" 
#> [3,] "Three" "15" "5"
like image 657
Eric Fail Avatar asked Dec 17 '22 23:12

Eric Fail


1 Answers

This works:

str_to_df <- function(string){
  t(sapply(1:length(string), function(x) strsplit(string, "[/[:space:]]+")[[x]])) }

string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')

str_to_df(string1)
#      [,1]    [,2] [,3]
# [1,] "One"   "58" "2" 
# [2,] "Two"   "22" "3" 
# [3,] "Three" "15" "5"

str_to_df(string2)
#      [,1]    [,2] [,3]
# [1,] "One"   "58" "2" 
# [2,] "Two"   "22" "3" 
# [3,] "Three" "15" "5"

Another approach with tidyr could be:

string1 %>% 
  as_tibble() %>% 
  separate(value, into = c("Col1", "Col2", "Col3"), sep = "[/[:space:]]+")

# A tibble: 3 x 3
#   Col1  Col2  Col3 
#   <chr> <chr> <chr>
# 1 One   58    2    
# 2 Two   22    3    
# 3 Three 15    5 
like image 181
kath Avatar answered Jan 07 '23 05:01

kath