Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting data frame into deeply nested list

Tags:

r

purrr

r-whisker

I'm trying to create a data structure that the whisker package expects, and I can't seem to figure out how create that structure from my data frame. Let's say I have the following data frame:

library(dplyr)  

existing_format <- 
  mtcars %>% 
    select(carb, gear, cyl) %>% 
    arrange(carb, gear, cyl) %>% 
    distinct() 

...I would like to go from existing_format to the following desired format (only first two elements of desired_format list are shown):

desired_format <- list(
  list( 
    carb = "1",
    gear = list(
      list(gear = "3", cyl = list(list(cyl = "4"), list(cyl = "6"))),
      list(gear = "4", cyl = list(list(cyl = "4")))
    )
  ),
  list( 
    carb = "2",
    gear = list(
      list(gear = "3", cyl = list(list(cyl = "8"))),
      list(gear = "4", cyl = list(list(cyl = "4"))),
      list(gear = "5", cyl = list(list(cyl = "4")))
    )
  )
)

I've tried things like grouping by carb and gear, then using tidyr::nest() to create a nested df, but nothing is doing. Something tells me that whisker::iteratelist() or whisker::rowSplit() is the way forward, but I can't figure it out.

Thanks, Chris

like image 616
Chris Avatar asked Dec 13 '17 21:12

Chris


People also ask

What is nested list with example?

A nested list is a list that appears as an element in another list. In this list, the element with index 3 is a nested list. If we print( nested[3] ), we get [10, 20] .

What is a nested list?

A list that occurs as an element of another list (which may ofcourse itself be an element of another list etc) is known as nested list.

Can data frame contain list?

Data frame columns can contain lists You can also create a data frame having a list as a column using the data. frame function, but with a little tweak. The list column has to be wrapped inside the function I.


2 Answers

Perhaps more flexible than it needs to be in this case, but you can do a recursive split

rsplit<-function(dd) {
  col <- names(dd)[1]
  dat <- dd[[1]]
  xx <- lapply(unique(dat), function(x) {
    z <- setNames(list(x), col)
    if(ncol(dd)>1) {
      z[[names(dd)[2]]] <- rsplit(dd[dat==x,-1, drop=FALSE])
    }
    z
  })
  xx
}

rsplit(existing_format)

This will split on all the columns and use the names from the column headers.

like image 72
MrFlick Avatar answered Sep 28 '22 19:09

MrFlick


Here's a way, not general for n columns, but it works for 3.

library(purrr)
library(magrittr)
library(dplyr)

output <- existing_format                           %>%
    map_df(as.character)                            %>%
    group_by(carb,gear)                             %>%
    summarize_at("cyl",~lst(map(.,~lst(cyl = .x)))) %>%
    mutate(gear = map2(.x = gear,.y = cyl,~lst(gear = .x,cyl = .y))) %>%
    group_by(carb)                                  %>%
    summarize_at("gear",~lst(gear=.))               %$%
    map2(.x = carb,.y = gear,~lst(carb = .x,gear = .y))

identical(output[1:2],desired_format) #TRUE
like image 23
Moody_Mudskipper Avatar answered Sep 28 '22 17:09

Moody_Mudskipper