Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Break list into rows while preserving identifiers in r

I'm working with the following type of dataset

    names<-c("Aname","Aname","Bname","Cname","Cname")
    list <- list( c('a, b','b, r','c, g'), c('d,g','e,j'),
    c('d, h','s, q','f,q'), c('d,r ','s, z'),c('d, r','d, r'))
    data<-cbind(names, list)

And want to break out each element of a list and then bind it with the "name" variable. So the dataset I'm trying to produce would look like this:

Column 1   Column 2
Aname      a
Aname      b
Aname      b
Aname      r
Aname      c

There have been many discussions of how to convert a list to a data.frame, but I'm struggling to find any advice about how to do this "within" a dataframe where I'd like to preserve identifiers on the same row as the list (in this case Names). Many thanks!

like image 631
Chris Bail Avatar asked May 08 '15 11:05

Chris Bail


5 Answers

You could use melt

library(reshape2)
melt(lapply(setNames(list, names), function(x)
                      unlist(strsplit(x, ', | |,'))))
like image 95
akrun Avatar answered Oct 18 '22 05:10

akrun


Here's a possible base R solution

myFunc <- function(x) unlist(strsplit(unlist(x), ", | |,"))

data.frame(Col1 = rep(names, sapply(list, function(x) length(myFunc(x)))), 
           Col2 = myFunc(list))

#     Col1 Col2
# 1  Aname    a
# 2  Aname    b
# 3  Aname    b
# 4  Aname    r
# 5  Aname    c
# 6  Aname    g
# 7  Aname    d
# 8  Aname    g
# 9  Aname    e
# 10 Aname    j
# 11 Bname    d
# 12 Bname    h
# 13 Bname    s
# 14 Bname    q
# 15 Bname    f
# 16 Bname    q
# 17 Cname    d
# 18 Cname    r
# 19 Cname    s
# 20 Cname    z
# 21 Cname    d
# 22 Cname    r
# 23 Cname    d
# 24 Cname    r
like image 43
David Arenburg Avatar answered Oct 18 '22 05:10

David Arenburg


One more approach with splitstackshape - its cSplit function strips whitespace adjacent to the delimiter by default.

library(splitstackshape)
lengths <- sapply(data[, 2], length)
nameslist <- unlist(rep(data[, 1], lengths))
df1 <- data.frame(names = nameslist, chars = unlist(data[, 2]))
cSplit(df1, "chars", sep = ",", direction = "long")

Or per Ananda's comment, simply:

cSplit(data.table(names = data[, "names"], list = sapply(data[, "list"], toString)),
 "list", ",", "long")

Result:

    names chars
 1: Aname     a
 2: Aname     b
 3: Aname     b
 4: Aname     r
 5: Aname     c
 6: Aname     g
 7: Aname     d
 8: Aname     g
 9: Aname     e
10: Aname     j
11: Bname     d
12: Bname     h
13: Bname     s
14: Bname     q
15: Bname     f
16: Bname     q
17: Cname     d
18: Cname     r
19: Cname     s
20: Cname     z
21: Cname     d
22: Cname     r
23: Cname     d
24: Cname     r

If you don't want the result as a data.table, you can wrap the last line in as.data.frame().

like image 31
Sam Firke Avatar answered Oct 18 '22 05:10

Sam Firke


Here is how to do it with dplyr/tidyr. The idea is to convert each element of list to a list itself (from a character vector, which it is currently) and then call the very useful unnest function

library(dplyr)
library(tidyr)
data.frame(data) %>% 
    unnest(list) %>% 
    mutate(list = strsplit(list, ",")) %>%
    unnest(list)
#   names list
#1  Aname    a
#2  Aname    b
#3  Aname    b
#4  Aname    r
#5  Aname    c
#6  Aname    g
#7  Aname    d
#8  Aname    g
#9  Aname    e
#10 Aname    j
#11 Bname    d
#12 Bname    h
#13 Bname    s
#14 Bname    q
#15 Bname    f
#16 Bname    q
#17 Cname    d
#18 Cname   r 
#19 Cname    s
#20 Cname    z
#21 Cname    d
#22 Cname    r
#23 Cname    d
#24 Cname    r

(To get rid of extra spaces, if needed, you can append%>% mutate(list = gsub(" ", "", list)) to the chain of commands.)

like image 41
konvas Avatar answered Oct 18 '22 07:10

konvas


The OP lumps two questions together.

The answer to the first is to clean the data. For example, copying @DavidArenburg's function:

myFunc <- function(x) unlist(strsplit(unlist(x), ", | |,")) 
clean  <- sapply(list, myFunc)

And the second step is to stack:

stack(setNames(clean,names))
like image 2
Frank Avatar answered Oct 18 '22 05:10

Frank