Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Standardize variables using dplyr [r]

I would like to standardize variables in R. I know about multiple approahces how this can be done. However, I realy like using this approach bellow:

library(tidyverse)

df <- mtcars

df %>% 
  gather() %>% 
  group_by(key) %>% 
  mutate(value = value - mean(value)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = key, values_from = value)

For some reason this approach does not work since I am not able to return the data to the original format. Therefore, I would like to ask for advice

like image 488
Petr Avatar asked Jul 03 '20 11:07

Petr


People also ask

How do I standardize a variable in R?

Method 1: Using Scale function. R has a built-in function called scale() for the purpose of standardization. Here, “x” represents the data column/dataset on which you want to apply standardization. “center” parameter takes boolean values, it will subtract the mean from the observation value when it is set to True.

What does it mean to standardize variables in R?

To standardize a dataset means to scale all of the values in the dataset such that the mean value is 0 and the standard deviation is 1. The most common way to do this is by using the z-score standardization, which scales values using the following formula: (xi – x) / s.

How do you Standardise a variable?

Typically, to standardize variables, you calculate the mean and standard deviation for a variable. Then, for each observed value of the variable, you subtract the mean and divide by the standard deviation.

What is Dplyr and Tidyr?

dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.

Is dplyr faster than plyr?

It is definitely faster than plyr (relevant only for large data sets). Maybe later I'll do up a dplyr example.

How to standardize data in R with example?

How to Standardize Data in R (With Examples) To standardize a dataset means to scale all of the values in the dataset such that the mean value is 0 and the standard deviation is 1. The most common way to do this is by using the z-score standardization, which scales values using the following formula: (xi – x) / s. where:

How do I use the z-score standardization in R?

The most common way to do this is by using the z-score standardization, which scales values using the following formula: The following examples show how to use the scale () function in unison with the dplyr package in R to scale one or more variables in a data frame using the z-score standardization.

How do you standardize binary variables?

Standardizing binary variables makes interpretation of binary variables vague as it cannot be increased by a standard deviation. The simplest solution is : not to standardize binary variables but code them as 0/1, and then standardize all other continuous variables by dividing by two standard deviation.


1 Answers

According to the current documentation, you should be using across-based syntax to perform operations on a desired subset of columns. You can use everything to select all columns or use any other available qualifier. You should only use group_by verb if your desire is to perform operation on groups. group_by is not right choice for selecting variables.

mtcars %>%
    as_tibble() %>%
    mutate(across(where(is.numeric), ~ . - mean(.)))

As for the actual standardisation or any other operation you want to apply to the subset of columns you can use:

.fns Functions to apply to each of the selected columns. Possible values are:

  • NULL, to returns the columns untransformed.
  • A function, e.g. mean.
  • A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE)
  • A list of functions/lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x))

So for scale you can do:

mtcars %>%
    as_tibble() %>%
    mutate(across(where(is.numeric), scale))

or with additional arguments

mtcars %>%
    as_tibble() %>%
    mutate(across(where(is.numeric), scale, center = FALSE))

Side notes

As you can see from ?scale documentation, the function returns matrix. In case of the examples above, you will get matrix with one column if this bothers you, you can do:

mtcars %>%
    as_tibble() %>%
    mutate(across(where(is.numeric),  ~ scale(.)[,1]))

Comparison

>> mtcars %>%
...     as_tibble() %>%
...     mutate(across(where(is.numeric),  ~ scale(.)[,1])) %>% 
...     glimpse()
Rows: 32
Columns: 11
$ mpg  <dbl> 0.15088482, 0.15088482, 0.44954345, 0.21725341, -0.23073453, -0.33028740, -0.96078…
$ cyl  <dbl> -0.1049878, -0.1049878, -1.2248578, -0.1049878, 1.0148821, -0.1049878, 1.0148821, …
$ disp <dbl> -0.57061982, -0.57061982, -0.99018209, 0.22009369, 1.04308123, -0.04616698, 1.0430…
$ hp   <dbl> -0.53509284, -0.53509284, -0.78304046, -0.53509284, 0.41294217, 
...
>> 
>> 
>> mtcars %>%
...     as_tibble() %>%
...     mutate(across(where(is.numeric), scale)) %>% 
...     glimpse()
Rows: 32
Columns: 11
$ mpg  <dbl[,1]> <matrix[32 x 1]>
$ cyl  <dbl[,1]> <matrix[32 x 1]>
$ disp <dbl[,1]> <matrix[32 x 1]>
$ hp   <dbl[,1]> <matrix[32 x 1]>
...
like image 129
Konrad Avatar answered Sep 22 '22 16:09

Konrad