Trying to use dplyr
to group_by
the stud_ID
variable in the following data frame, as in this SO question:
> str(df) 'data.frame': 4136 obs. of 4 variables: $ stud_ID : chr "ABB112292" "ABB112292" "ABB112292" "ABB112292" ... $ behavioral_scale: num 3.5 4 3.5 3 3.5 2 NA NA 1 2 ... $ cognitive_scale : num 3.5 3 3 3 3.5 2 NA NA 1 1 ... $ affective_scale : num 2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ...
I tried the following to obtain scale scores by student (rather than scale scores for observations across all students):
scaled_data <- df %>% group_by(stud_ID) %>% mutate(behavioral_scale_ind = scale(behavioral_scale), cognitive_scale_ind = scale(cognitive_scale), affective_scale_ind = scale(affective_scale))
Here is the result:
> str(scaled_data) Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 4136 obs. of 7 variables: $ stud_ID : chr "ABB112292" "ABB112292" "ABB112292" "ABB112292" ... $ behavioral_scale : num 3.5 4 3.5 3 3.5 2 NA NA 1 2 ... $ cognitive_scale : num 3.5 3 3 3 3.5 2 NA NA 1 1 ... $ affective_scale : num 2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ... $ behavioral_scale_ind: num [1:12, 1] 0.64 1.174 0.64 0.107 0.64 ... ..- attr(*, "scaled:center")= num 2.9 ..- attr(*, "scaled:scale")= num 0.937 $ cognitive_scale_ind : num [1:12, 1] 1.17 0.64 0.64 0.64 1.17 ... ..- attr(*, "scaled:center")= num 2.4 ..- attr(*, "scaled:scale")= num 0.937 $ affective_scale_ind : num [1:12, 1] 0 1.28 0.64 0.64 0 ... ..- attr(*, "scaled:center")= num 2.5 ..- attr(*, "scaled:scale")= num 0.782
The three scaled variables (behavioral_scale
, cognitive_scale
, and affective_scale
) have only 12 observations - the same number of observations for the first student, ABB112292
.
What's going on here? How can I obtain scaled scores by individual?
scale() function in R Language is a generic function which centers and scales the columns of a numeric matrix. The center parameter takes either numeric alike vector or logical value. If the numeric vector is provided, then each column of the matrix has the corresponding value from center subtracted from it.
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output.
dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".
The problem seems to be in the base scale()
function, which expects a matrix. Try writing your own.
scale_this <- function(x){ (x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE) }
Then this works:
library("dplyr") # reproducible sample data set.seed(123) n = 1000 df <- data.frame(stud_ID = sample(LETTERS, size=n, replace=TRUE), behavioral_scale = runif(n, 0, 10), cognitive_scale = runif(n, 1, 20), affective_scale = runif(n, 0, 1) ) scaled_data <- df %>% group_by(stud_ID) %>% mutate(behavioral_scale_ind = scale_this(behavioral_scale), cognitive_scale_ind = scale_this(cognitive_scale), affective_scale_ind = scale_this(affective_scale))
Or, if you're open to a data.table
solution:
library("data.table") setDT(df) cols_to_scale <- c("behavioral_scale","cognitive_scale","affective_scale") df[, lapply(.SD, scale_this), .SDcols = cols_to_scale, keyby = factor(stud_ID)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With