Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to use dplyr to group_by and apply scale()

Tags:

Trying to use dplyr to group_by the stud_ID variable in the following data frame, as in this SO question:

> str(df) 'data.frame':   4136 obs. of  4 variables:  $ stud_ID         : chr  "ABB112292" "ABB112292" "ABB112292" "ABB112292" ...  $ behavioral_scale: num  3.5 4 3.5 3 3.5 2 NA NA 1 2 ...  $ cognitive_scale : num  3.5 3 3 3 3.5 2 NA NA 1 1 ...  $ affective_scale : num  2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ... 

I tried the following to obtain scale scores by student (rather than scale scores for observations across all students):

scaled_data <-            df %>%               group_by(stud_ID) %>%                   mutate(behavioral_scale_ind = scale(behavioral_scale),                          cognitive_scale_ind = scale(cognitive_scale),                          affective_scale_ind = scale(affective_scale)) 

Here is the result:

> str(scaled_data) Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 4136 obs. of  7 variables:  $ stud_ID             : chr  "ABB112292" "ABB112292" "ABB112292" "ABB112292" ...  $ behavioral_scale    : num  3.5 4 3.5 3 3.5 2 NA NA 1 2 ...  $ cognitive_scale     : num  3.5 3 3 3 3.5 2 NA NA 1 1 ...  $ affective_scale     : num  2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ...  $ behavioral_scale_ind: num [1:12, 1] 0.64 1.174 0.64 0.107 0.64 ...   ..- attr(*, "scaled:center")= num 2.9   ..- attr(*, "scaled:scale")= num 0.937  $ cognitive_scale_ind : num [1:12, 1] 1.17 0.64 0.64 0.64 1.17 ...   ..- attr(*, "scaled:center")= num 2.4   ..- attr(*, "scaled:scale")= num 0.937  $ affective_scale_ind : num [1:12, 1] 0 1.28 0.64 0.64 0 ...   ..- attr(*, "scaled:center")= num 2.5   ..- attr(*, "scaled:scale")= num 0.782 

The three scaled variables (behavioral_scale, cognitive_scale, and affective_scale) have only 12 observations - the same number of observations for the first student, ABB112292.

What's going on here? How can I obtain scaled scores by individual?

like image 540
Joshua Rosenberg Avatar asked Mar 03 '16 15:03

Joshua Rosenberg


People also ask

How does scale () work in R?

scale() function in R Language is a generic function which centers and scales the columns of a numeric matrix. The center parameter takes either numeric alike vector or logical value. If the numeric vector is provided, then each column of the matrix has the corresponding value from center subtracted from it.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What R package is Group_by?

Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output.

Can you use dplyr in a function?

dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".


1 Answers

The problem seems to be in the base scale() function, which expects a matrix. Try writing your own.

scale_this <- function(x){   (x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE) } 

Then this works:

library("dplyr")  # reproducible sample data set.seed(123) n = 1000 df <- data.frame(stud_ID = sample(LETTERS, size=n, replace=TRUE),                  behavioral_scale = runif(n, 0, 10),                  cognitive_scale = runif(n, 1, 20),                  affective_scale = runif(n, 0, 1) ) scaled_data <-    df %>%   group_by(stud_ID) %>%   mutate(behavioral_scale_ind = scale_this(behavioral_scale),          cognitive_scale_ind = scale_this(cognitive_scale),          affective_scale_ind = scale_this(affective_scale)) 

Or, if you're open to a data.table solution:

library("data.table")  setDT(df)  cols_to_scale <- c("behavioral_scale","cognitive_scale","affective_scale")  df[, lapply(.SD, scale_this), .SDcols = cols_to_scale, keyby = factor(stud_ID)]  
like image 155
C8H10N4O2 Avatar answered Oct 07 '22 00:10

C8H10N4O2