I have a list of data frames and the names of the element list contains information about each data frame.
Here is a reproducible example,
list_df <- list(jan_2013 = data.frame(id = 1:10, x = rnorm(10), y = runif(10)),
feb_2013 = data.frame(id = 1:10, x = rnorm(10), y = runif(10)))
How can create a column in each data frame with the information contained in the element names?
I'm working with purrr
operations over the list, so how can use purrr::map
to iterate over each data frame and have access to the element name on which is store in the list?
$jan_2013
id x y meta_information
1 0.2 2.3 jan_2013
2 0.3 2.1 jan_2013
$feb_2013
id x y meta_information
1 0.1 2.4 feb_2013
2 1.4 2.1 feb_2013
An alternate approach is to collapse your list into a single data frame and use the name of the list as an additional column.
dplyr::bind_rows(list_df, .id = "meta_information")
# # A tibble: 20 x 4
# meta_information id x y
# <chr> <int> <dbl> <dbl>
# 1 jan_2013 1 -1.09 0.877
# 2 jan_2013 2 0.136 0.828
# 3 jan_2013 3 -0.376 0.0376
# 4 jan_2013 4 -0.793 0.780
# 5 jan_2013 5 0.259 0.179
# 6 jan_2013 6 0.971 0.556
# 7 jan_2013 7 -0.787 0.579
# 8 jan_2013 8 -0.294 0.563
# 9 jan_2013 9 0.331 0.896
# 10 jan_2013 10 -0.392 0.577
# 11 feb_2013 1 0.0139 0.0381
# 12 feb_2013 2 0.640 0.0744
# 13 feb_2013 3 0.813 0.270
# 14 feb_2013 4 -0.748 0.305
# 15 feb_2013 5 0.528 0.380
# 16 feb_2013 6 -0.627 0.832
# 17 feb_2013 7 -1.21 0.0529
# 18 feb_2013 8 1.45 0.494
# 19 feb_2013 9 0.490 0.402
# 20 feb_2013 10 -0.765 0.531
If it is really necessary to keep the lists separate, we can use an indexed map from purrr
purrr::imap(list_df, ~mutate(.x, meta_information = .y))
# $jan_2013
# id x y meta_information
# 1 1 -1.0867168 0.87674573 jan_2013
# 2 2 0.1357794 0.82798892 jan_2013
# 3 3 -0.3763973 0.03761698 jan_2013
# 4 4 -0.7934503 0.77968454 jan_2013
# 5 5 0.2586395 0.17917052 jan_2013
# 6 6 0.9707220 0.55617247 jan_2013
# 7 7 -0.7871748 0.57870521 jan_2013
# 8 8 -0.2939041 0.56255010 jan_2013
# 9 9 0.3307507 0.89646137 jan_2013
# 10 10 -0.3917830 0.57723403 jan_2013
#
# $feb_2013
# id x y meta_information
# 1 1 0.01386418 0.03814336 feb_2013
# 2 2 0.64030914 0.07435783 feb_2013
# 3 3 0.81281978 0.26987216 feb_2013
# 4 4 -0.74768467 0.30482967 feb_2013
# 5 5 0.52820991 0.38045027 feb_2013
# 6 6 -0.62720336 0.83191998 feb_2013
# 7 7 -1.20532079 0.05291640 feb_2013
# 8 8 1.45277032 0.49355127 feb_2013
# 9 9 0.48985425 0.40229656 feb_2013
# 10 10 -0.76508432 0.53114667 feb_2013
I found a way to do the task with purrr::map2
iterating over two arguments in parallel: list_df
and the names(list_df)
. Then an anonymous function used these two arguments, taking a data frame (df
) and creating a constant column based on the name of the element (name_elem_contain_df
) that contain the data frame (df
)
purrr::map2(list_df, names(list_df),
function(df, name_elem_contain_df) mutate(df, meta_information = name_elem_contain_df))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With