Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract data frames from nested list

Tags:

r

I have a nested list of lists which contains some data frames. However, the data frames can appear at any level in the list. What I want to end up with is a flat list, i.e. just one level, where each element is only the data frames, with all other things discarded.

I have come up with a solution for this, but it looks very clunky and I am sure there ought to be a more elegant solution.

Importantly, I'm looking for something in base R, that can extract data frames at any level inside the nested list. I have tried unlist() and dabbled with rapply() but somehow not found a satisfying solution.

Example code follows: an example list, what I am actually trying to achieve, and my own solution which I am not very happy with. Thanks for any help!

# extract dfs from list

# example of multi-level list with some dfs in it
# note, dfs could be nested at any level
problem1 <- list(x1 = 1,
              x2 = list(
                x3 = "dog",
                x4 = data.frame(cats = c(1, 2),
                               pigs = c(3, 4))
              ),
              x5 = data.frame(sheep = c(1,2,3),
                             goats = c(4,5,6)),
              x6 = list(a = 2,
                       b = "c"),
              x7 = head(cars,5))

# want to end up with flat list like this (names format is optional)
result1 <- list(x2.x4 = data.frame(cats = c(1, 2),
                                   pigs = c(3, 4)),
                x5 = data.frame(sheep = c(1,2,3),
                                goats = c(4,5,6)),
                x7 = head(cars,5))

# my solution (not very satisfactory)

exit_loop <- FALSE
while(exit_loop == FALSE){
  # find dfs (logical)
  idfs <- sapply(problem1, is.data.frame)
  # check if all data frames
  exit_loop <- all(idfs)
  # remove anything not df or list
  problem1 <- problem1[idfs | sapply(problem1, is.list)]
  # find dfs again (logical)
  idfs <- sapply(problem1, is.data.frame)
  # unlist only the non-df part
  problem1 <- c(problem1[idfs], unlist(problem1[!idfs], recursive = FALSE))

}

like image 337
Will Avatar asked Dec 28 '21 21:12

Will


People also ask

Can you have a list of data frames?

A Data frame is simply a List of a specified class called “data. frame”, but the components of the list must be vectors (numeric, character, logical), factors, matrices (numeric), lists, or even other data frames.

What is a nested data frame?

A nested data frame is a data frame where one (or more) columns is a list of data frames.

How do you make a nested Dataframe in R?

Or more commonly, we can create nested data frames using tidyr::nest() . df %>% nest(x, y) specifies the columns to be nested; i.e. the columns that will appear in the inner data frame. Alternatively, you can nest() a grouped data frame created by dplyr::group_by() .

How do I create a Dataframe with a nested list?

Create dataframe using data.frame function with the do.call and cbind. cbind is used to bind the lists together by column into data frame. do.call is used to bind the cbind and the nested list together as a single argument in the Data frame function.

How to extract single lists from a nested list in R?

We can now extract single lists from this data frame using the $ operator: The previous R code has printed the first sub-list of our nested list (or the first variable of our new data frame respectively) to the RStudio console. Example 2 shows how to bind the sub-lists of a nested list as rows in a matrix object.

How to create a Dataframe from a nested dictionary in Python?

We first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty. Finally we apply the DataFrames function in the pandas library to create the Data Frame.

How to bind cbind and nested list in a Dataframe?

do.call is used to bind the cbind and the nested list together as a single argument in the Data frame function. Also, store the whole data frame in a variable named data_frame and print the variable.


Video Answer


1 Answers

Maybe consider a simple recursive function like this

find_df <- function(x) {
  if (is.data.frame(x))
    return(list(x))
  if (!is.list(x))
    return(NULL)
  unlist(lapply(x, find_df), FALSE)
}

Results

> find_df(problem1)
$x2.x4
  cats pigs
1    1    3
2    2    4

$x5
  sheep goats
1     1     4
2     2     5
3     3     6

$x7
  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
like image 69
ekoam Avatar answered Sep 27 '22 20:09

ekoam