Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rowMeans function in dplyr

Tags:

r

dplyr

I have been trying to run the calculate rowMeans within dplyr's mutate function, but keep getting errors. Below is an example DATA set and desired RESULT.

DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), 
                  DATE = c("1","1","2","2","3","3","3","4","4"), 
                  STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
                  STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))

RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), 
                    DATE = c("1","1","2","2","3","3","3","4","4"), 
                    STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
                    STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
                    NAYSA = c(1.5, 3, 45, 60, 150, 300, 450, 7500, 9000))

The code I have written begins by randomly sampling STUFF and STUFF2. Then I would like to calculate the rowMeans of STUFF and STUFF2 and export the result to a new column. I could accomplish this task using tidyr, but would have to redo a larger number of variables. Furthermore I could use the R base package, but prefer to find a solution using the mutate function in dplyr. Thanks in advance.

RESULT = group_by(DATA, SITE, DATE) %>%
  mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
  # These approaches return errors 
  mutate(NAYSA = rowMeans(DATA[,-1:-2]))
  mutate(NAYSA = rowMeans(.[,-1:-2])) 
  mutate (NAYSE = rowMeans(.))
like image 589
Vesuccio Avatar asked Mar 16 '15 17:03

Vesuccio


People also ask

How do you use rowMeans?

To find the row mean for columns by ignoring missing values, we would need to use rowMeans function with na. rm. For example, if we have a data frame called df that contains five columns and some of the values are missing then the row means will be calculated by using the command: rowMeans(df,na. rm=TRUE).

What does rowwise () do in R?

rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.

How do I sum rows in dplyr?

Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame.

How do I get the mean of multiple rows in R?

The rowMeans() function in R can be used to calculate the mean of several rows of a matrix or data frame in R.


2 Answers

You need the rowwise function in dplyr to do that. Your data is random (because of sample) so it produces different results but you will see that it works:

library(dplyr)
  group_by(DATA, SITE, DATE) %>%
  mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
  rowwise() %>%
  mutate(NAYSA = mean(c(STUFF,STUFF2)))

Output:

Source: local data frame [9 x 5]
Groups: <by row>

  SITE DATE STUFF STUFF2  NAYSA
1    A    1     1      2    1.5
2    A    1     2      2    2.0
3    A    2    30     80   55.0
4    A    2    30     60   45.0
5    B    3   200    600  400.0
6    B    3   300    200  250.0
7    B    3   100    600  350.0
8    C    4  5000  12000 8500.0
9    C    4  6000  10000 8000.0

As you see it calculates the rowwise mean per row, according to STUFF and STUFF2

like image 158
LyzandeR Avatar answered Oct 12 '22 21:10

LyzandeR


@GregF Yep....ungroup() was the key. Thanks.

Working code

RESULT = group_by(DATA, SITE, DATE) %>% 
  mutate(STUFF = sample(STUFF,replace= TRUE), 
         STUFF2 = sample(STUFF2,replace= TRUE)) %>% 
  ungroup() %>% 
  mutate(NAYSA = rowMeans(.[,-1:-2]))
like image 34
Vesuccio Avatar answered Oct 12 '22 20:10

Vesuccio