Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a column based on the name of the element list that contain the data frame in R

Tags:

r

purrr

I have a list of data frames and the names of the element list contains information about each data frame.

Here is a reproducible example,

list_df <- list(jan_2013 = data.frame(id = 1:10, x = rnorm(10), y = runif(10)), 
                feb_2013 = data.frame(id = 1:10, x = rnorm(10), y = runif(10)))

How can create a column in each data frame with the information contained in the element names? I'm working with purrr operations over the list, so how can use purrr::map to iterate over each data frame and have access to the element name on which is store in the list?

$jan_2013
  id    x    y   meta_information
   1    0.2  2.3      jan_2013
   2    0.3  2.1      jan_2013

$feb_2013
  id    x    y   meta_information
   1    0.1  2.4      feb_2013
   2    1.4  2.1      feb_2013
like image 465
Cristóbal Alcázar Avatar asked Jan 14 '18 23:01

Cristóbal Alcázar


2 Answers

An alternate approach is to collapse your list into a single data frame and use the name of the list as an additional column.

dplyr::bind_rows(list_df, .id = "meta_information")

# # A tibble: 20 x 4
#   meta_information    id       x      y
#   <chr>            <int>   <dbl>  <dbl>
# 1 jan_2013             1 -1.09   0.877 
# 2 jan_2013             2  0.136  0.828 
# 3 jan_2013             3 -0.376  0.0376
# 4 jan_2013             4 -0.793  0.780 
# 5 jan_2013             5  0.259  0.179 
# 6 jan_2013             6  0.971  0.556 
# 7 jan_2013             7 -0.787  0.579 
# 8 jan_2013             8 -0.294  0.563 
# 9 jan_2013             9  0.331  0.896 
# 10 jan_2013           10 -0.392  0.577 
# 11 feb_2013            1  0.0139 0.0381
# 12 feb_2013            2  0.640  0.0744
# 13 feb_2013            3  0.813  0.270 
# 14 feb_2013            4 -0.748  0.305 
# 15 feb_2013            5  0.528  0.380 
# 16 feb_2013            6 -0.627  0.832 
# 17 feb_2013            7 -1.21   0.0529
# 18 feb_2013            8  1.45   0.494 
# 19 feb_2013            9  0.490  0.402 
# 20 feb_2013           10 -0.765  0.531 

If it is really necessary to keep the lists separate, we can use an indexed map from purrr

purrr::imap(list_df, ~mutate(.x, meta_information = .y))

# $jan_2013
#    id          x          y meta_information
# 1   1 -1.0867168 0.87674573         jan_2013
# 2   2  0.1357794 0.82798892         jan_2013
# 3   3 -0.3763973 0.03761698         jan_2013
# 4   4 -0.7934503 0.77968454         jan_2013
# 5   5  0.2586395 0.17917052         jan_2013
# 6   6  0.9707220 0.55617247         jan_2013
# 7   7 -0.7871748 0.57870521         jan_2013
# 8   8 -0.2939041 0.56255010         jan_2013
# 9   9  0.3307507 0.89646137         jan_2013
# 10 10 -0.3917830 0.57723403         jan_2013
# 
# $feb_2013
#    id           x          y meta_information
# 1   1  0.01386418 0.03814336         feb_2013
# 2   2  0.64030914 0.07435783         feb_2013
# 3   3  0.81281978 0.26987216         feb_2013
# 4   4 -0.74768467 0.30482967         feb_2013
# 5   5  0.52820991 0.38045027         feb_2013
# 6   6 -0.62720336 0.83191998         feb_2013
# 7   7 -1.20532079 0.05291640         feb_2013
# 8   8  1.45277032 0.49355127         feb_2013
# 9   9  0.48985425 0.40229656         feb_2013
# 10 10 -0.76508432 0.53114667         feb_2013
like image 158
Kevin Arseneau Avatar answered Oct 31 '22 04:10

Kevin Arseneau


I found a way to do the task with purrr::map2 iterating over two arguments in parallel: list_df and the names(list_df). Then an anonymous function used these two arguments, taking a data frame (df) and creating a constant column based on the name of the element (name_elem_contain_df) that contain the data frame (df)

purrr::map2(list_df, names(list_df), 
    function(df, name_elem_contain_df) mutate(df, meta_information = name_elem_contain_df))
like image 32
Cristóbal Alcázar Avatar answered Oct 31 '22 05:10

Cristóbal Alcázar