I have three text documents stored as a list of lists called "dlist":
dlist <- structure(list(name = c("a", "b", "c"), text = list(c("the", "quick", "brown"), c("fox", "jumps", "over", "the"), c("lazy", "dog"))), .Names = c("name", "text"))
In my head I find it helpful to picture dlist like this:
name text
1 a c("the", "quick", "brown")
2 b c("fox", "jumps", "over", "the")
3 c c("lazy", "dog")
How can this be manipulated be like below? The idea is to graph it, so something that can be melted for ggplot2 would be good.
name text
1 a the
2 a quick
3 a brown
4 b fox
5 b jumps
6 b over
7 b the
8 c lazy
9 c dog
That's one row per word, giving both the word and its parent document.
I have tried:
> expand.grid(dlist)
name text
1 a the, quick, brown
2 b the, quick, brown
3 c the, quick, brown
4 a fox, jumps, over, the
5 b fox, jumps, over, the
6 c fox, jumps, over, the
7 a lazy, dog
8 b lazy, dog
9 c lazy, dog
> sapply(seq(1,3), function(x) (expand.grid(dlist$name[[x]], dlist$text[[x]])))
[,1] [,2] [,3]
Var1 factor,3 factor,4 factor,2
Var2 factor,3 factor,4 factor,2
unlist(dlist)
name1 name2 name3 text1 text2 text3 text4
"a" "b" "c" "the" "quick" "brown" "fox"
text5 text6 text7 text8 text9
"jumps" "over" "the" "lazy" "dog"
> sapply(seq(1,3), function(x) (cbind(dlist$name[[x]], dlist$text[[x]])))
[[1]]
[,1] [,2]
[1,] "a" "the"
[2,] "a" "quick"
[3,] "a" "brown"
[[2]]
[,1] [,2]
[1,] "b" "fox"
[2,] "b" "jumps"
[3,] "b" "over"
[4,] "b" "the"
[[3]]
[,1] [,2]
[1,] "c" "lazy"
[2,] "c" "dog"
It's fair to say I'm befuddled by the various apply and plyr functions and don't really know where to start. I've never seen a result like in the "sapply" attempt above, and don't understand it.
If you convert your dlist
to a named list (a better suited structure in my opinion), you can use stack()
to get the two column data.frame you want.
(The rev()
and setNames()
calls in the second line are just one of many ways to adjust the column ordering and names to match the desired output shown in your question.)
x <- setNames(dlist$text, dlist$name)
setNames(rev(stack(x)), c("name", "text"))
# name text
# 1 a the
# 2 a quick
# 3 a brown
# 4 b fox
# 5 b jumps
# 6 b over
# 7 b the
# 8 c lazy
# 9 c dog
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With