Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

add column with row wise mean over selected columns using dplyr

Tags:

regex

r

dplyr

I have a data frame which contains several variables which got measured at different time points (e.g., test1_tp1, test1_tp2, test1_tp3, test2_tp1, test2_tp2,...).

I am now trying to use dplyr to add a new column to a data frame that calculates the row wise mean over a selection of these columns (e.g., mean over all time points for test1).

  1. I struggle even with the syntax for calculating the mean over explicitly named columns. What I tried without success was:

data %>% ... %>% mutate(test1_mean = mean(test1_tp1, test1_tp2, test1_tp3, na.rm = TRUE)

  1. I would further like to use regex/wildcards to select the column names, so something like

data %>% ... %>% mutate(test1_mean = mean(matches("test1_.*"), na.rm = TRUE)

like image 710
user21932 Avatar asked Jan 26 '15 21:01

user21932


People also ask

How do I sum across rows in R dplyr?

Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame. Syntax: rowSums(.)

How do I get the mean of multiple columns in R?

To find the mean of multiple columns based on multiple grouping columns in R data frame, we can use summarise_at function with mean function.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What does rowwise () do in R?

rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping. The exception is summarise() , which return a grouped_df.


1 Answers

You can use starts_with inside select to find all columns starting with a certain string.

data %>%
  mutate(test1 = select(., starts_with("test1_")) %>%
           rowMeans(na.rm = TRUE))
like image 75
Sven Hohenstein Avatar answered Oct 19 '22 06:10

Sven Hohenstein