Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting all but the first element of a vector in data frame

I have some data that looks like this:

X1
A,B,C,D,E
A,B
A,B,C,D
A,B,C,D,E,F

I want to generate one column that holds the first element of each vector ("A"), and another column that holds all the rest of the values ("B","C" etc.):

X1              Col1    Col2
A,B,C,D,E       A       B,C,D,E
A,B             A       B
A,B,C,D         A       B,C,D
A,B,C,D,E,F     A       B,C,D,E,F

I have tried the following:

library(dplyr)

testdata <- data.frame(X1 = c("A,B,C,D,E",
                              "A,B",
                              "A,B,C,D",
                              "A,B,C,D,E,F")) %>%
  mutate(Col1 = sapply(strsplit(X1, ","), "[", 1),
         Col2 = sapply(strsplit(X1, ","), "[", -1))

However I cannot seem to get rid of the pesky vector brackets around the values in Col2. Any way of doing this?

like image 758
Haakonkas Avatar asked Dec 29 '25 13:12

Haakonkas


2 Answers

You can use tidyr::separate with extra = "merge":

testdata %>% 
  tidyr::separate(X1, into = c("Col1","Col2"), sep = ",", extra = "merge", remove = F)

           X1 Col1      Col2
1   A,B,C,D,E    A   B,C,D,E
2         A,B    A         B
3     A,B,C,D    A     B,C,D
4 A,B,C,D,E,F    A B,C,D,E,F
like image 60
Maël Avatar answered Jan 01 '26 04:01

Maël


A possible solution, using tidyr::separate:

library(tidyverse)

df <- data.frame(
  stringsAsFactors = FALSE,
  X1 = c("A,B,C,D,E", "A,B", "A,B,C,D", "A,B,C,D,E,F")
)

df %>% 
  separate(X1, into = str_c("col", 1:2), sep = "(?<=^.),", remove = F)

#>            X1 col1      col2
#> 1   A,B,C,D,E    A   B,C,D,E
#> 2         A,B    A         B
#> 3     A,B,C,D    A     B,C,D
#> 4 A,B,C,D,E,F    A B,C,D,E,F
like image 39
PaulS Avatar answered Jan 01 '26 06:01

PaulS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!