I would like to standardize variables in R. I know about multiple approahces how this can be done. However, I realy like using this approach bellow:
library(tidyverse)
df <- mtcars
df %>%
gather() %>%
group_by(key) %>%
mutate(value = value - mean(value)) %>%
ungroup() %>%
pivot_wider(names_from = key, values_from = value)
For some reason this approach does not work since I am not able to return the data to the original format. Therefore, I would like to ask for advice
Method 1: Using Scale function. R has a built-in function called scale() for the purpose of standardization. Here, “x” represents the data column/dataset on which you want to apply standardization. “center” parameter takes boolean values, it will subtract the mean from the observation value when it is set to True.
To standardize a dataset means to scale all of the values in the dataset such that the mean value is 0 and the standard deviation is 1. The most common way to do this is by using the z-score standardization, which scales values using the following formula: (xi – x) / s.
Typically, to standardize variables, you calculate the mean and standard deviation for a variable. Then, for each observed value of the variable, you subtract the mean and divide by the standard deviation.
dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.
It is definitely faster than plyr (relevant only for large data sets). Maybe later I'll do up a dplyr example.
How to Standardize Data in R (With Examples) To standardize a dataset means to scale all of the values in the dataset such that the mean value is 0 and the standard deviation is 1. The most common way to do this is by using the z-score standardization, which scales values using the following formula: (xi – x) / s. where:
The most common way to do this is by using the z-score standardization, which scales values using the following formula: The following examples show how to use the scale () function in unison with the dplyr package in R to scale one or more variables in a data frame using the z-score standardization.
Standardizing binary variables makes interpretation of binary variables vague as it cannot be increased by a standard deviation. The simplest solution is : not to standardize binary variables but code them as 0/1, and then standardize all other continuous variables by dividing by two standard deviation.
According to the current documentation, you should be using across
-based syntax to perform operations on a desired subset of columns. You can use everything
to select all columns or use any other available qualifier. You should only use group_by
verb if your desire is to perform operation on groups. group_by
is not right choice for selecting variables.
mtcars %>%
as_tibble() %>%
mutate(across(where(is.numeric), ~ . - mean(.)))
As for the actual standardisation or any other operation you want to apply to the subset of columns you can use:
.fns Functions to apply to each of the selected columns. Possible values are:
NULL
, to returns the columns untransformed.- A function, e.g.
mean
.- A purrr-style lambda, e.g.
~ mean(.x, na.rm = TRUE)
- A list of functions/lambdas, e.g.
list(mean = mean, n_miss = ~ sum(is.na(.x))
So for scale
you can do:
mtcars %>%
as_tibble() %>%
mutate(across(where(is.numeric), scale))
or with additional arguments
mtcars %>%
as_tibble() %>%
mutate(across(where(is.numeric), scale, center = FALSE))
As you can see from ?scale
documentation, the function returns matrix. In case of the examples above, you will get matrix with one column if this bothers you, you can do:
mtcars %>%
as_tibble() %>%
mutate(across(where(is.numeric), ~ scale(.)[,1]))
>> mtcars %>%
... as_tibble() %>%
... mutate(across(where(is.numeric), ~ scale(.)[,1])) %>%
... glimpse()
Rows: 32
Columns: 11
$ mpg <dbl> 0.15088482, 0.15088482, 0.44954345, 0.21725341, -0.23073453, -0.33028740, -0.96078…
$ cyl <dbl> -0.1049878, -0.1049878, -1.2248578, -0.1049878, 1.0148821, -0.1049878, 1.0148821, …
$ disp <dbl> -0.57061982, -0.57061982, -0.99018209, 0.22009369, 1.04308123, -0.04616698, 1.0430…
$ hp <dbl> -0.53509284, -0.53509284, -0.78304046, -0.53509284, 0.41294217,
...
>>
>>
>> mtcars %>%
... as_tibble() %>%
... mutate(across(where(is.numeric), scale)) %>%
... glimpse()
Rows: 32
Columns: 11
$ mpg <dbl[,1]> <matrix[32 x 1]>
$ cyl <dbl[,1]> <matrix[32 x 1]>
$ disp <dbl[,1]> <matrix[32 x 1]>
$ hp <dbl[,1]> <matrix[32 x 1]>
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With