Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

From list to data frame with tidyverse, selecting specific list elements

Tags:

r

tidyverse

A simple question but I've searched for a solution, and so far to no avail.

Say that I have a list object, and I want to pull specific list elements and output them side-by-side as dataframe columns. How can I achieve this with tidyverse/piping in a simple way? Attempt to solve it below.

Data

some_data <-
structure(list(x = c(23.7, 23.41, 23.87, 24.18, 24.15, 24.31, 
23.14, 23.72, 24.12, 23.47, 23.59, 23.29, 23.24, 23.5, 23.56, 
23.16, 23.62, 23.67, 23.84, 23.69, 23.7, 23.68, 24.2, 23.77, 
23.74, 23.64, 24.39, 24.05, 24.51, 23.6, 24.29, 23.31, 23.96, 
24.07, 24.37, 23.77, 23.64, 24, 23.68, 24.02, 23.36, 23.54, 23.34, 
23.69, 23.79, 23.8, 23.7, 24.45, 23.27, 23.57, 23.02, 24.23, 
23.41, 23.6, 24.02, 23.94, 24.06, 23.97, 23.38, 23.46, 24, 23.89, 
23.51, 23.72, 23.83, 23.96, 23.84, 23.52, 24.36, 23.94, 23.82, 
24.04, 24.05, 23.6, 23.52, 24.13, 23.43, 23.33, 24.01, 23.99, 
24.46, 24.23, 24.19, 23.83, 23.8, 23.93, 23.79, 23.48, 23.26, 
24.04, 23.93, 23.98, 23.86, 23.49, 24.17, 23.7, 23.54, 23.55, 
23.67, 23.66)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -100L), spec = structure(list(cols = list(
    x = structure(list(), class = c("collector_double", "collector"
    ))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

I want the value output of the `hist()` function for this data

library(tidyverse)

some_data$x %>% 
   as.numeric() %>% 
   hist(breaks = seq(from = 23, to = 24.6, by = 0.2),
        plot = FALSE)

## $breaks
## [1] 23.0 23.2 23.4 23.6 23.8 24.0 24.2 24.4 24.6

## $counts
## [1]  3  9 20 23 19 16  7  3

## $density
## [1] 0.15 0.45 1.00 1.15 0.95 0.80 0.35 0.15

## $mids
## [1] 23.1 23.3 23.5 23.7 23.9 24.1 24.3 24.5

## $xname
## [1] "."

## $equidist
## [1] TRUE

## attr(,"class")
## [1] "histogram"

So let's say that I want both `$breaks` and `$counts` side by side as a data frame

I will supplement the original pipe so that:

some_data$x %>% 
   as.numeric() %>% 
   hist(breaks = seq(from = 23, to = 24.6, by = 0.2),
        plot = FALSE) %>%
##
   map_df(~.[1:30]) %>%
   select(bins = breaks, 
          frequency = counts)
##

## # A tibble: 30 x 2
##     bins frequency
##    <dbl>     <int>
##  1  23           3
##  2  23.2         9
##  3  23.4        20
##  4  23.6        23
##  5  23.8        19
##  6  24          16
##  7  24.2         7
##  8  24.4         3
##  9  24.6        NA
## 10  NA          NA
## # ... with 20 more rows

So yes, it does work, but in map_df() I had to put a relatively large "magic" number (arbitrarily I put 30) to ensure all data is included. Is there a simpler way to get $breaks and $counts as a dataframe? Maybe even with just one step instead of combining map_df() and then select()?

COMMENT

While this specific problem demonstrated the case of a histogram class, my general question isn't about histograms, but principle about list objects. The nice thing about the output of hist(plot = FALSE) is that it generates an object with unequal-length elements, which is a demonstration of a problem that needs a flexible solution to account for the variance in element length.

SOLUTION

Based on Rémi Coulaud's (chosen) solution below, the way to address the situation of unequal lengths of list elements is to make them equal, anchoring to the lengthiest element. Then, it's not a problem anymore. The working pipe is as follows:

library(tidyverse)

some_data$x %>% 
  as.numeric() %>% 
  hist(breaks = seq(from = 23, to = 24.6, by = 0.2),
       plot = FALSE) %>%
  lapply(., `length<-`, max(lengths(.))) %>%  ## make all elements as the length of the longest one
  map_df(~.) %>%
  select(bins = breaks, 
         frequency = counts)

Thanks!

like image 635
Emman Avatar asked Dec 15 '19 11:12

Emman


1 Answers

We can use imap and enframe to convert each element in the list to a data frame with name (row number) and value (the element name). We can then use reduce and full_join to join all data frames. Finally, we can select the columns we want. This approach does not need to specify a "magic" number.

library(tidyverse)

some_data$x %>% 
  as.numeric() %>% 
  hist(breaks = seq(from = 23, to = 24.6, by = 0.2),
       plot = FALSE) %>%
  imap(~enframe(.x, value = .y)) %>%
  reduce(full_join, by = "name") %>%
  select(bins = breaks, 
         frequency = counts)
# # A tibble: 9 x 2
#   bins frequency
#   <dbl>     <int>
# 1  23           3
# 2  23.2         9
# 3  23.4        20
# 4  23.6        23
# 5  23.8        19
# 6  24          16
# 7  24.2         7
# 8  24.4         3
# 9  24.6        NA
like image 182
www Avatar answered Oct 12 '22 19:10

www