Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr split string into a comma separated list

Tags:

r

dplyr

I'm trying to use dplyr to split a string into a comma separated string and I'm not having much luck.

dat<-data.frame(key=1:4,labels=c('a','ab','abc','b'))

I'm trying to get the labels column to be c('a','a,b','a,b,c','b')

I've tried all of the below variations but nothing seems to work.

dat %>%
  mutate(labels=str_split(labels,''))

dat %>%
  mutate(labels=str_split(labels,'')[[1]])

dat %>%
  mutate(labels=paste(str_split(labels,''),collapse=','))
like image 481
Ben Carlson Avatar asked Oct 21 '25 17:10

Ben Carlson


2 Answers

dplyr or mutate has nothing to do with your question. Your problems are more along the lines of trying to treat a list (returned by str_split) as a vector.

I would write a little function to do it:

comma_sep = function(x) {
    x = strsplit(as.character(x), "")
    unlist(lapply(x, paste, collapse = ','))
}

You can then

mutate(dat, labels = comma_sep(labels))
#   key labels
# 1   1      a
# 2   2    a,b
# 3   3  a,b,c
# 4   4      b

But of course you could jam the meat of the function into that one line as well.

like image 65
Gregor Thomas Avatar answered Oct 23 '25 06:10

Gregor Thomas


Replace each non-boundary with a comma like this:

dat %>% mutate(labels = gsub("\\B", ",", labels, perl = TRUE))

or with a slightly more complex regex but without perl=TRUE, replace each character that is followed by a non-boundary with that character followed by comma:

dat %>% mutate(labels = gsub("(.)\\B", "\\1,", labels))

Either one gives:

  key labels
1   1      a
2   2    a,b
3   3  a,b,c
4   4      b
like image 40
G. Grothendieck Avatar answered Oct 23 '25 08:10

G. Grothendieck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!