Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Something like expand.grid on a list of lists

Tags:

r

I have three text documents stored as a list of lists called "dlist":

dlist <- structure(list(name = c("a", "b", "c"), text = list(c("the", "quick", "brown"), c("fox", "jumps", "over", "the"), c("lazy", "dog"))), .Names = c("name", "text"))

In my head I find it helpful to picture dlist like this:

   name  text
1  a     c("the", "quick", "brown")
2  b     c("fox", "jumps", "over", "the")
3  c     c("lazy", "dog")

How can this be manipulated be like below? The idea is to graph it, so something that can be melted for ggplot2 would be good.

  name  text
1    a   the
2    a quick
3    a brown
4    b   fox
5    b jumps
6    b  over
7    b   the
8    c  lazy
9    c   dog

That's one row per word, giving both the word and its parent document.

I have tried:

> expand.grid(dlist)
  name                  text
1    a     the, quick, brown
2    b     the, quick, brown
3    c     the, quick, brown
4    a fox, jumps, over, the
5    b fox, jumps, over, the
6    c fox, jumps, over, the
7    a             lazy, dog
8    b             lazy, dog
9    c             lazy, dog

> sapply(seq(1,3), function(x) (expand.grid(dlist$name[[x]], dlist$text[[x]])))
     [,1]     [,2]     [,3]    
Var1 factor,3 factor,4 factor,2
Var2 factor,3 factor,4 factor,2

unlist(dlist)
  name1   name2   name3   text1   text2   text3   text4 
    "a"     "b"     "c"   "the" "quick" "brown"   "fox" 
  text5   text6   text7   text8   text9 
"jumps"  "over"   "the"  "lazy"   "dog"

> sapply(seq(1,3), function(x) (cbind(dlist$name[[x]], dlist$text[[x]])))
[[1]]
     [,1] [,2]   
[1,] "a"  "the"  
[2,] "a"  "quick"
[3,] "a"  "brown"

[[2]]
     [,1] [,2]   
[1,] "b"  "fox"  
[2,] "b"  "jumps"
[3,] "b"  "over" 
[4,] "b"  "the"  

[[3]]
     [,1] [,2]  
[1,] "c"  "lazy"
[2,] "c"  "dog" 

It's fair to say I'm befuddled by the various apply and plyr functions and don't really know where to start. I've never seen a result like in the "sapply" attempt above, and don't understand it.

like image 818
nacnudus Avatar asked May 01 '13 20:05

nacnudus


1 Answers

If you convert your dlist to a named list (a better suited structure in my opinion), you can use stack() to get the two column data.frame you want.

(The rev() and setNames() calls in the second line are just one of many ways to adjust the column ordering and names to match the desired output shown in your question.)

x <- setNames(dlist$text, dlist$name)
setNames(rev(stack(x)),  c("name", "text"))
#   name  text
# 1    a   the
# 2    a quick
# 3    a brown
# 4    b   fox
# 5    b jumps
# 6    b  over
# 7    b   the
# 8    c  lazy
# 9    c   dog
like image 193
Josh O'Brien Avatar answered Sep 19 '22 00:09

Josh O'Brien