I am trying to split a data frame column into multiple columns based on some delimiters. I have found various answers on this site, and I'm trying to get different ways to work. I'm having trouble with ldply. The problem is that the output of strsplit is a list of elements of different lengths. Here is some sample data, what has worked, and what I'm attempting with ldply.
FirstName <- c("a,b", "c d", "e, f", "gh")
OtherInfo <- c(1:4)
df <- data.frame(FirstName, OtherInfo, stringsAsFactors = FALSE)
print(df)
#Solution with cSplit
library(splitstackshape)
cs <- cSplit(df, "FirstName", "[, ]+", fixed = FALSE)
#Solution with strsplit and as.data.frame
#Feels like a hack, and I have "gh" repeated
#Question: Is there a better way using a similar approach?
df2 <- t(as.data.frame(strsplit(df$FirstName, "[, ]+", fixed = FALSE)))
row.names(df2) <- NULL
#Question: Solution with strsplit and plyr
library(plyr)
list1 <- strsplit(df$FirstName, "[, ]+", fixed = FALSE)
df3 <- ldply(list1)
Error:
#Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) :
# Results do not have equal lengths
I wrote this fix to insert NA values, but it doesn't feel like the best way. Is there a better way?
MAX = max(sapply(list1, length))
func1 <- function(x, MAX) {
vec <- c(x, rep(NA, MAX-length(x)))
return(vec)
}
list2 <- lapply(list1, func1, MAX = MAX)
list2
df3.1 <- ldply(list2)
Here is one quick solution with dplyr.
library(dplyr)
df4 <- df %>%
mutate( parts = strsplit(FirstName, "[, ]+", fixed=FALSE) ) %>%
group_by( FirstName ) %>%
do( data.frame(
{
idx <- 1:length(.$parts[[1]])
lst <- lapply(idx,
function(x) .$parts[[1]][x])
names(lst) <- lapply(idx,
function(x) paste("Firstname",x,sep="") )
(lst)
} , stringsAsFactors=FALSE)
) %>%
inner_join(df,by="FirstName")
print(df4)
For the provided example, I get:
Source: local data frame [4 x 4]
Groups: FirstName
FirstName Firstname1 Firstname2 OtherInfo
1 a,b a b 1
2 c d c d 2
3 e, f e f 3
4 gh gh NA 4
The logic of the solution is as follows:
1. Split each first name into a list of parts
2. For each FirstName create a new data.frame such as the data comes from parts but variable names are FirstName1, FirstName2 etc
3. Merge the dataset back to the original so that to put OtherInfo back into it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With