I'm working with the following type of dataset
names<-c("Aname","Aname","Bname","Cname","Cname")
list <- list( c('a, b','b, r','c, g'), c('d,g','e,j'),
c('d, h','s, q','f,q'), c('d,r ','s, z'),c('d, r','d, r'))
data<-cbind(names, list)
And want to break out each element of a list and then bind it with the "name" variable. So the dataset I'm trying to produce would look like this:
Column 1 Column 2
Aname a
Aname b
Aname b
Aname r
Aname c
There have been many discussions of how to convert a list to a data.frame, but I'm struggling to find any advice about how to do this "within" a dataframe where I'd like to preserve identifiers on the same row as the list (in this case Names). Many thanks!
You could use melt
library(reshape2)
melt(lapply(setNames(list, names), function(x)
unlist(strsplit(x, ', | |,'))))
Here's a possible base R solution
myFunc <- function(x) unlist(strsplit(unlist(x), ", | |,"))
data.frame(Col1 = rep(names, sapply(list, function(x) length(myFunc(x)))),
Col2 = myFunc(list))
# Col1 Col2
# 1 Aname a
# 2 Aname b
# 3 Aname b
# 4 Aname r
# 5 Aname c
# 6 Aname g
# 7 Aname d
# 8 Aname g
# 9 Aname e
# 10 Aname j
# 11 Bname d
# 12 Bname h
# 13 Bname s
# 14 Bname q
# 15 Bname f
# 16 Bname q
# 17 Cname d
# 18 Cname r
# 19 Cname s
# 20 Cname z
# 21 Cname d
# 22 Cname r
# 23 Cname d
# 24 Cname r
One more approach with splitstackshape
- its cSplit
function strips whitespace adjacent to the delimiter by default.
library(splitstackshape)
lengths <- sapply(data[, 2], length)
nameslist <- unlist(rep(data[, 1], lengths))
df1 <- data.frame(names = nameslist, chars = unlist(data[, 2]))
cSplit(df1, "chars", sep = ",", direction = "long")
Or per Ananda's comment, simply:
cSplit(data.table(names = data[, "names"], list = sapply(data[, "list"], toString)),
"list", ",", "long")
Result:
names chars
1: Aname a
2: Aname b
3: Aname b
4: Aname r
5: Aname c
6: Aname g
7: Aname d
8: Aname g
9: Aname e
10: Aname j
11: Bname d
12: Bname h
13: Bname s
14: Bname q
15: Bname f
16: Bname q
17: Cname d
18: Cname r
19: Cname s
20: Cname z
21: Cname d
22: Cname r
23: Cname d
24: Cname r
If you don't want the result as a data.table
, you can wrap the last line in as.data.frame()
.
Here is how to do it with dplyr/tidyr. The idea is to convert each element of list
to a list itself (from a character vector, which it is currently) and then call the very useful unnest
function
library(dplyr)
library(tidyr)
data.frame(data) %>%
unnest(list) %>%
mutate(list = strsplit(list, ",")) %>%
unnest(list)
# names list
#1 Aname a
#2 Aname b
#3 Aname b
#4 Aname r
#5 Aname c
#6 Aname g
#7 Aname d
#8 Aname g
#9 Aname e
#10 Aname j
#11 Bname d
#12 Bname h
#13 Bname s
#14 Bname q
#15 Bname f
#16 Bname q
#17 Cname d
#18 Cname r
#19 Cname s
#20 Cname z
#21 Cname d
#22 Cname r
#23 Cname d
#24 Cname r
(To get rid of extra spaces, if needed, you can append%>% mutate(list = gsub(" ", "", list))
to the chain of commands.)
The OP lumps two questions together.
The answer to the first is to clean the data. For example, copying @DavidArenburg's function:
myFunc <- function(x) unlist(strsplit(unlist(x), ", | |,"))
clean <- sapply(list, myFunc)
And the second step is to stack:
stack(setNames(clean,names))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With