I have a list of lists which contain strings. The first string of each sub-list describes the category to which the following strings belong. I want to get a (long-format) data frame with one column for the category and one for the content. How can I get a data frame in long format from this list:
mylist <- list(
c("A","lorem","ipsum"),
c("B","sed", "eiusmod", "tempor" ,"inci"),
c("C","aliq", "ex", "ea"))
> mylist
[[1]]
[1] "A" "lorem" "ipsum"
[[2]]
[1] "B" "sed" "eiusmod" "tempor" "incidunt"
[[3]]
[1] "C" "aliquid" "ex" "ea"
It should look like this data frame
mydf <- data.frame(cate= c("A","A","B","B","B","B","C","C","C"),
cont= c("lorem","ipsum","sed", "eiusmod", "tempor","inci","aliq", "ex", "ea"))
> mydf
cate cont
1 A lorem
2 A ipsum
3 B sed
4 B eiusmod
5 B tempor
6 B incidunt
7 C aliquid
8 C ex
9 C ea
I've already seperated the categories and the content.
cate <- sapply(mylist, "[[",1)
cont <- sapply(mylist, "[", -(1))
How to proceed to get mydf?
Using your original list and not the split objects you created, you can try the following:
library(data.table)
setorder(melt(as.data.table(transpose(mylist)),
id.vars = "V1", na.rm = TRUE), V1, variable)[]
# V1 variable value
# 1: A V2 lorem
# 2: A V3 ipsum
# 3: B V2 sed
# 4: B V3 eiusmod
# 5: B V4 tempor
# 6: B V5 inci
# 7: C V2 aliq
# 8: C V3 ex
# 9: C V4 ea
For fun, you can also try one of the following:
library(dplyr)
library(tidyr)
data_frame(id = seq_along(mylist), mylist) %>%
unnest %>%
group_by(id) %>%
mutate(ind = mylist[1]) %>%
slice(2:n())
library(purrr)
data_frame(
value = mylist %>% map(~ .x[-1]) %>% unlist,
ind = mylist %>% map(~ rep(.x[1], length(.x)-1)) %>% unlist
)
Note that you will be annoyed by the fact that "purrr" also has a transpose
function, which means if you have "data.table" loaded as well, you will have to get into the habit of using things like data.table::transpose
or purrr::transpose
if you are using those functions (like I did in the original answer). I haven't tested, but my guess is that "data.table" would still be the fastest starting from your original list.
We can use stack
after naming the list
elements of 'cont' with 'cape'.
setNames(stack(setNames(cont, cate))[2:1], c('cate', 'cont'))
# cate cont
#1 A lorem
#2 A ipsum
#3 B sed
#4 B eiusmod
#5 B tempor
#6 B inci
#7 C aliq
#8 C ex
#9 C ea
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With