I have a problem with split column value when element of column has different number of strings. I can do it in plyr e.g.:
library(plyr)
column <- c("jake", "jane jane","john john john")
df <- data.frame(1:3, name = column)
df$name <- as.character(df$name)
df2 <- ldply(strsplit(df$name, " "), rbind)
View(df2)
As a result, we have data frame with number of column related to maximum number of stings in given element.
When I try to do it in dplyr, I used do
function:
library(dplyr)
df2 <- df %>%
do(data.frame(strsplit(.$name, " ")))
but I get an error:
Error in data.frame("jake", c("jane", "jane"), c("john", "john", "john" : arguments imply differing number of rows: 1, 2, 3
It seems to me that it should be used rbind
function but I do not know where.
You're having troubles because strsplit()
returns a list which we then need to apply as.data.frame.list()
to each element to get it into the proper format that dplyr
requires. Even then it would still require a bit more work to get usable results. Long story short, it doesn't seem like a suitable operation for do()
.
I think you might be better off using separate()
from tidyr
. It can easily be used with dplyr
functions and chains. It's not clear whether you want to keep the first column since your ldply
result for df2
does not have it, so I left it off.
library(tidyr)
separate(df[-1], name, 1:3, " ", extra = "merge")
# 1 2 3
# 1 jake <NA> <NA>
# 2 jane jane <NA>
# 3 john john john
You could also use cSplit
. It is also very efficient since it relies on data.table
library(splitstackshape)
cSplit(df[-1], "name", " ")
# name_1 name_2 name_3
# 1: jake NA NA
# 2: jane jane NA
# 3: john john john
Or more specifically
setnames(df2 <- cSplit(df[-1], "name", " "), names(df2), as.character(1:3))
df2
# 1 2 3
# 1: jake NA NA
# 2: jane jane NA
# 3: john john john
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With