Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get long format data frame from list

Tags:

list

dataframe

r

I have a list of lists which contain strings. The first string of each sub-list describes the category to which the following strings belong. I want to get a (long-format) data frame with one column for the category and one for the content. How can I get a data frame in long format from this list:

mylist <- list(
  c("A","lorem","ipsum"),
  c("B","sed", "eiusmod", "tempor" ,"inci"),
  c("C","aliq", "ex", "ea"))

> mylist
[[1]]
[1] "A"     "lorem" "ipsum"

[[2]]
[1] "B"        "sed"      "eiusmod"  "tempor"   "incidunt"

[[3]]
[1] "C"       "aliquid" "ex"      "ea" 

It should look like this data frame

mydf <- data.frame(cate= c("A","A","B","B","B","B","C","C","C"),
               cont= c("lorem","ipsum","sed", "eiusmod", "tempor","inci","aliq", "ex", "ea"))

> mydf
  cate    cont
1   A    lorem
2   A    ipsum
3   B      sed
4   B  eiusmod
5   B   tempor
6   B incidunt
7   C  aliquid
8   C       ex
9   C       ea

I've already seperated the categories and the content.

cate <- sapply(mylist, "[[",1)
cont <- sapply(mylist, "[", -(1))

How to proceed to get mydf?

like image 846
bill.meiner Avatar asked Feb 12 '16 11:02

bill.meiner


2 Answers

Using your original list and not the split objects you created, you can try the following:

library(data.table)
setorder(melt(as.data.table(transpose(mylist)), 
              id.vars = "V1", na.rm = TRUE), V1, variable)[]
#    V1 variable   value
# 1:  A       V2   lorem
# 2:  A       V3   ipsum
# 3:  B       V2     sed
# 4:  B       V3 eiusmod
# 5:  B       V4  tempor
# 6:  B       V5    inci
# 7:  C       V2    aliq
# 8:  C       V3      ex
# 9:  C       V4      ea

For fun, you can also try one of the following:


library(dplyr)
library(tidyr)

data_frame(id = seq_along(mylist), mylist) %>%
  unnest %>%
  group_by(id) %>%
  mutate(ind = mylist[1]) %>%
  slice(2:n())

library(purrr)
data_frame(
  value = mylist %>% map(~ .x[-1]) %>% unlist,
  ind = mylist %>% map(~ rep(.x[1], length(.x)-1)) %>% unlist
)

Note that you will be annoyed by the fact that "purrr" also has a transpose function, which means if you have "data.table" loaded as well, you will have to get into the habit of using things like data.table::transpose or purrr::transpose if you are using those functions (like I did in the original answer). I haven't tested, but my guess is that "data.table" would still be the fastest starting from your original list.

like image 176
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 28 '22 01:09

A5C1D2H2I1M1N2O1R2T1


We can use stack after naming the list elements of 'cont' with 'cape'.

setNames(stack(setNames(cont, cate))[2:1], c('cate', 'cont'))
#  cate    cont
#1    A   lorem
#2    A   ipsum
#3    B     sed
#4    B eiusmod
#5    B  tempor
#6    B    inci
#7    C    aliq
#8    C      ex
#9    C      ea
like image 28
akrun Avatar answered Sep 27 '22 23:09

akrun