Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to strsplit different number of strings in certain column by do function

Tags:

r

dplyr

plyr

I have a problem with split column value when element of column has different number of strings. I can do it in plyr e.g.:

library(plyr)
column <- c("jake", "jane jane","john john john")
df <- data.frame(1:3, name = column)
df$name <- as.character(df$name)
df2 <- ldply(strsplit(df$name, " "), rbind)
View(df2)

As a result, we have data frame with number of column related to maximum number of stings in given element.

When I try to do it in dplyr, I used do function:

library(dplyr)
df2 <- df %>%
  do(data.frame(strsplit(.$name, " ")))

but I get an error:

Error in data.frame("jake", c("jane", "jane"), c("john", "john", "john" : 
arguments imply differing number of rows: 1, 2, 3

It seems to me that it should be used rbind function but I do not know where.

like image 374
Nicolabo Avatar asked Dec 01 '14 21:12

Nicolabo


1 Answers

You're having troubles because strsplit() returns a list which we then need to apply as.data.frame.list() to each element to get it into the proper format that dplyr requires. Even then it would still require a bit more work to get usable results. Long story short, it doesn't seem like a suitable operation for do().

I think you might be better off using separate() from tidyr. It can easily be used with dplyr functions and chains. It's not clear whether you want to keep the first column since your ldply result for df2 does not have it, so I left it off.

library(tidyr)
separate(df[-1], name, 1:3, " ", extra = "merge")
#      1    2    3
# 1 jake <NA> <NA>
# 2 jane jane <NA>
# 3 john john john

You could also use cSplit. It is also very efficient since it relies on data.table

library(splitstackshape)
cSplit(df[-1], "name", " ")
#    name_1 name_2 name_3
# 1:   jake     NA     NA
# 2:   jane   jane     NA
# 3:   john   john   john

Or more specifically

setnames(df2 <- cSplit(df[-1], "name", " "), names(df2), as.character(1:3))
df2
#       1    2    3
# 1: jake   NA   NA
# 2: jane jane   NA
# 3: john john john
like image 104
Rich Scriven Avatar answered Oct 15 '22 23:10

Rich Scriven