I have multiple dataframes/tibbles with the same exact structure, but different contents. Their name is the only way I can differentiate them. The objective is to merge them all together into one dataframe, with a factor column. The original dataframes have one column for each hour/measurement, so first I want to gather everything.
Imagine columns 5 to 11 of the mtcars df are my hour columns.
mt1 <- mtcars
mt2 <- mtcars
mt3 <- mtcars
mt4 <- mtcars
mtlist <- list(m1 = mt1,
m2 = mt2,
m3 = mt3,
m4 = mt4)
require(tidyverse)
mtlist_tidy <- lapply(mtlist, function(x){
df <- x %>%
gather(exp, temp_name, 5:11)
return(df)
})
Now I'm stuck. I need to rename the "temp_name" column in each of the dfs inside mtlist_tidy with the name of that df i.e. m1, m2, etc:
> head(mtlist_tidy$m1)
mpg cyl disp hp exp temp_name
1 21.0 6 160 110 drat 3.90
2 21.0 6 160 110 drat 3.90
3 22.8 4 108 93 drat 3.85
4 21.4 6 258 110 drat 3.08
5 18.7 8 360 175 drat 3.15
6 18.1 6 225 105 drat 2.76
should become
> head(mtlist_tidy$m1)
mpg cyl disp hp exp m1
1 21.0 6 160 110 drat 3.90
2 21.0 6 160 110 drat 3.90
3 22.8 4 108 93 drat 3.85
4 21.4 6 258 110 drat 3.08
5 18.7 8 360 175 drat 3.15
6 18.1 6 225 105 drat 2.76
Then purrr::reduce(mtlist_tidy, full_join)
would work, completing my task.
I guess there must a solution using only purrr
and skipping lapply, but I'm not that familiar yet with this package.
Rename Columns with List using set_axis() Alternatively, you can use DataFrame. set_axis() method to rename columns with list. use inplace=True param to rename columns on the existing DataFrame object.
You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.
A couple of ideas:
First, to approach the problem as you are current you could use map2
to loop through both the list and the names of the list simultaneously. You can then name the new columns as you go with the list names via gather_
(for standard evaluation).
map2(mtlist, names(mtlist), ~gather_(.x, "exp", .y, names(.x)[5:11]) )
Note the next version of purrr will have imap
as a short-cut for looping through a list and the names of the list. Also, the next version of tidyr will use tidyeval
and gather_
will be deprecated.
Second, you could keep things in a long format by using map_df
for the looping instead of lapply
. map_df
uses bind_rows
at the end under the hood, and you can include a grouping variable for each list via the .id
argument.
mtlist %>%
map_df(~.x %>% gather("exp", "temp_name", 5:11), .id = "name" )
To put your dataset in a wide format from here you can use spread
. It takes a little more work in this example because some of the identifying variables like hp
and disp
have the same value across multiple rows.
mtlist %>%
map_df(~.x %>% gather("exp", "temp_name", 5:11), .id = "name" ) %>%
group_by(name) %>%
mutate( rows = 1:n() ) %>%
spread(name, temp_name)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With