Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transmute over multiple columns in dplyr

Tags:

r

dplyr

I have a dplyr question: How do I use transmute over each column without writing each column out by hand? I.e. is there something like transmute_each()?

I want to do the following: Using dplyr I want to get the z-score of each column for a MWE below:

tickers <- c(rep(1,10),rep(2,10))
df <- data.frame(cbind(tickers,rep(1:20),rep(2:21),rep(2:21),rep(4:23),rep(3:22)))
colnames(df) <- c("tickers","col1","col2","col3","col4","col5")
df %>%  group_by(tickers)

Is there a simple way to then use transmute to achieve the following:

for(i in 2:ncol(df)){
  df[,i] <- df[,i] - mean(df[,i])/sd(df[,i])
}

Many thanks

like image 568
Nick Avatar asked Sep 23 '15 11:09

Nick


People also ask

Can you group by multiple columns in dplyr?

By using group_by() function from dplyr package we can perform group by on multiple columns or variables (two or more columns) and summarise on multiple columns for aggregations.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

How do I combine columns in R dplyr?

How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.


2 Answers

Now that there is a transmute_at() function (as of dplyr 0.7), you can do the following:

df %>% 
    group_by(tickers) %>% 
    transmute_at(.vars = vars(starts_with("col")),
                 .funs = funs(scale(.))) %>% 
    ungroup

Note that this uses the scale() function from base R, which by default converts a numeric vector into a z-score.

Also, the use of vars() in the .vars argument allows you to use all the helper functions that are available for dplyr's select(), such as one_of(), ends_with(), etc.

Finally, instead of writing funs(scale(.)) here, since you're using a simple function in the .funs argument, you can just write .funs = scale.

like image 64
bschneidr Avatar answered Nov 07 '22 05:11

bschneidr


I solved this using the following:

df %>%  
   group_by(tickers) %>%  
   mutate_at(.funs = funs((. - mean(.))/sd(.)),
             .cols = vars(matches("col")))
like image 22
Nick Avatar answered Nov 07 '22 07:11

Nick