I have a dplyr question: How do I use <code>transmute</code> over each column without writing each column out by hand? I.e. is there something like <code>transmute_each()</code>? I want to do the following: Using dplyr I want to get the z-score of each column for a MWE below: <pre class="prettyprint"><code>tickers <- c(rep(1,10),rep(2,10)) df <- data.frame(cbind(tickers,rep(1:20),rep(2:21),rep(2:21),rep(4:23),rep(3:22))) colnames(df) <- c("tickers","col1","col2","col3","col4","col5") df %>% group_by(tickers) </code></pre> Is there a simple way to then use transmute to achieve the following: <pre class="prettyprint"><code>for(i in 2:ncol(df)){ df[,i] <- df[,i] - mean(df[,i])/sd(df[,i]) } </code></pre> Many thanks

Now that there is a <code>transmute_at()</code> function (as of dplyr 0.7), you can do the following: <pre class="prettyprint"><code>df %>% group_by(tickers) %>% transmute_at(.vars = vars(starts_with("col")), .funs = funs(scale(.))) %>% ungroup </code></pre> Note that this uses the <code>scale()</code> function from base R, which by default converts a numeric vector into a z-score. Also, the use of <code>vars()</code> in the <code>.vars</code> argument allows you to use all the helper functions that are available for dplyr's <code>select()</code>, such as <code>one_of()</code>, <code>ends_with()</code>, etc. Finally, instead of writing <code>funs(scale(.))</code> here, since you're using a simple function in the <code>.funs</code> argument, you can just write <code>.funs = scale</code>.

I solved this using the following: <pre class="prettyprint"><code>df %>% group_by(tickers) %>% mutate_at(.funs = funs((. - mean(.))/sd(.)), .cols = vars(matches("col"))) </code></pre>

Transmute over multiple columns in dplyr

Tags:

r

dplyr

I have a dplyr question: How do I use transmute over each column without writing each column out by hand? I.e. is there something like transmute_each()?

I want to do the following: Using dplyr I want to get the z-score of each column for a MWE below:

tickers <- c(rep(1,10),rep(2,10))
df <- data.frame(cbind(tickers,rep(1:20),rep(2:21),rep(2:21),rep(4:23),rep(3:22)))
colnames(df) <- c("tickers","col1","col2","col3","col4","col5")
df %>%  group_by(tickers)

Is there a simple way to then use transmute to achieve the following:

for(i in 2:ncol(df)){
  df[,i] <- df[,i] - mean(df[,i])/sd(df[,i])
}

Many thanks

568

asked Sep 23 '15 11:09

Nick

2 Answers

Now that there is a transmute_at() function (as of dplyr 0.7), you can do the following:

df %>% 
    group_by(tickers) %>% 
    transmute_at(.vars = vars(starts_with("col")),
                 .funs = funs(scale(.))) %>% 
    ungroup

Note that this uses the scale() function from base R, which by default converts a numeric vector into a z-score.

Also, the use of vars() in the .vars argument allows you to use all the helper functions that are available for dplyr's select(), such as one_of(), ends_with(), etc.

Finally, instead of writing funs(scale(.)) here, since you're using a simple function in the .funs argument, you can just write .funs = scale.

answered Nov 07 '22 05:11

bschneidr

I solved this using the following:

df %>%  
   group_by(tickers) %>%  
   mutate_at(.funs = funs((. - mean(.))/sd(.)),
             .cols = vars(matches("col")))

answered Nov 07 '22 07:11

Nick

Related questions
                            
                                Creating new columns by splitting a variable into many variables (in R)
                            
                                How to prevent command line args from being interpreted by R vs. only by my script?
                            
                                How to save a data frame in R
                            
                                Shiny : showing one message for all errors
                            
                                How to apply a function to the entire table in a dplyr chain
                            
                                Combining dplyr::mutate with lubridate::ymd_hms in R randomly causes segfault
                            
                                Parallel version of sapply
                            
                                Identifying points in a curve
                            
                                Scroll bar under a wide table not fitting the page width
                            
                                Is there a way to generate a confidence interval from a caret lm object?
                            
                                Confused with the locale settings in R
                            
                                Speed decrease in subsetting `data.table` when adding a bracket
                            
                                Split list every n elements and cbind, then rbind slices
                            
                                How to calculate FactorAnalysis scores using Python (scikit-learn)?
                            
                                Fill multidimensional array by row
                            
                                Saving ggplots to a list in a for loop
                            
                                R Shiny server not rendering correct ggplot font family
                            
                                Change font-family with CSS in dashboardBody shinydashboard
                            
                                Lattice full plot area
                            
                                How to count the number of non-empty fields in a delimited file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Transmute over multiple columns in dplyr

Tags:

r

dplyr

Nick

People also ask

2 Answers

bschneidr

Nick

Recent Activity

Donate For Us