My data looks like:
[[1]]
date germany france germany_mean france_mean germany_sd france_sd
1 2016-01-01 17 25 21.29429 48.57103 30.03026 47.05169
What I am trying to do is to compute the following calculation over all the lists using map
.
germany_calc = (germany - germany_mean) / germany_sd
france_calc = (france - france_mean) / france_sd
However the number of columns can change - here there are two categories/countries but in another list there could be 1 or 3 or N. The countries always follow the same structure. That is,
"country1", "country2", ... , "countryN", "country1_mean", "country2_mean", ... , "countryN_mean", "country1_sd", "country2_sd", ... , "countryN_sd".
Expected Output (for the first list):
Germany: -0.1429988 = (17 - 21.29429) / 30.03026
France: -0.5009603 = (25 - 48.57103) / 47.05169
EDIT: Apologies - expected output:
-0.1429988
-0.5009603
Function:
Scale_Me <- function(x){
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
Data:
my_list <- list(structure(list(date = structure(16801, class = "Date"),
germany = 17, france = 25, germany_mean = 21.2942922374429,
france_mean = 48.5710301846855, germany_sd = 30.030258443028,
france_sd = 47.0516928425878), class = "data.frame", row.names = c(NA,
-1L)), structure(list(date = structure(16802, class = "Date"),
germany = 9, france = 29, germany_mean = 21.2993150684932,
france_mean = 48.5605316914534, germany_sd = 30.0286190461173,
france_sd = 47.0543871206842), class = "data.frame", row.names = c(NA,
-1L)), structure(list(date = structure(16803, class = "Date"),
germany = 8, france = 18, germany_mean = 21.2947488584475,
france_mean = 48.551889593794, germany_sd = 30.0297291333284,
france_sd = 47.0562416513092), class = "data.frame", row.names = c(NA,
-1L)), structure(list(date = structure(16804, class = "Date"),
germany = 3, france = 11, germany_mean = 21.2778538812785,
france_mean = 48.5382545766386, germany_sd = 30.0267943793948,
france_sd = 47.0607680244109), class = "data.frame", row.names = c(NA,
-1L)), structure(list(date = structure(16805, class = "Date"),
germany = 4, france = 13, germany_mean = 21.2614155251142,
france_mean = 48.5214531240057, germany_sd = 30.0269420596686,
france_sd = 47.0676011750263), class = "data.frame", row.names = c(NA,
-1L)), structure(list(date = structure(16806, class = "Date"),
germany = 4, france = 9, germany_mean = 21.253196347032,
france_mean = 48.5055948249362, germany_sd = 30.0292032528186,
france_sd = 47.0737183354519), class = "data.frame", row.names = c(NA,
-1L)))
To access a specific column in a dataframe by name, you use the $ operator in the form df$name where df is the name of the dataframe, and name is the name of the column you are interested in. This operation will then return the column you want as a vector.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
One of the most famous and most used features of R is the *apply() family of functions, such as apply() , tapply() , and lapply() . Here, we'll look at apply() , which instructs R to call a user-specified function on each of the rows or each of the columns of a matrix.
Why not just rbind
the thing?
with(do.call(rbind, my_list),
cbind(germany=(germany - germany_mean) / germany_sd,
france=(france - france_mean) / france_sd))
# germany france
# [1,] -0.1429988 -0.5009603
# [2,] -0.4095864 -0.4157005
# [3,] -0.4427196 -0.6492633
# [4,] -0.6087181 -0.7976550
# [5,] -0.5748642 -0.7546901
# [6,] -0.5745473 -0.8392283
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With