I have been trying to run the calculate rowMeans
within dplyr
's mutate
function, but keep getting errors. Below is an example DATA set and desired RESULT.
DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))
RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
NAYSA = c(1.5, 3, 45, 60, 150, 300, 450, 7500, 9000))
The code I have written begins by randomly sampling STUFF
and STUFF2
. Then I would like to calculate the rowMeans
of STUFF
and STUFF2
and export the result to a new column. I could accomplish this task using tidyr
, but would have to redo a larger number of variables. Furthermore I could use the R base package, but prefer to find a solution using the mutate
function in dplyr
. Thanks in advance.
RESULT = group_by(DATA, SITE, DATE) %>%
mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
# These approaches return errors
mutate(NAYSA = rowMeans(DATA[,-1:-2]))
mutate(NAYSA = rowMeans(.[,-1:-2]))
mutate (NAYSE = rowMeans(.))
To find the row mean for columns by ignoring missing values, we would need to use rowMeans function with na. rm. For example, if we have a data frame called df that contains five columns and some of the values are missing then the row means will be calculated by using the command: rowMeans(df,na. rm=TRUE).
rowwise() allows you to compute on a data frame a row-at-a-time. This is most useful when a vectorised function doesn't exist. Most dplyr verbs preserve row-wise grouping.
Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified. The argument . is used to apply the function over all the cells of the data frame.
The rowMeans() function in R can be used to calculate the mean of several rows of a matrix or data frame in R.
You need the rowwise
function in dplyr
to do that. Your data is random (because of sample) so it produces different results but you will see that it works:
library(dplyr)
group_by(DATA, SITE, DATE) %>%
mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
rowwise() %>%
mutate(NAYSA = mean(c(STUFF,STUFF2)))
Output:
Source: local data frame [9 x 5]
Groups: <by row>
SITE DATE STUFF STUFF2 NAYSA
1 A 1 1 2 1.5
2 A 1 2 2 2.0
3 A 2 30 80 55.0
4 A 2 30 60 45.0
5 B 3 200 600 400.0
6 B 3 300 200 250.0
7 B 3 100 600 350.0
8 C 4 5000 12000 8500.0
9 C 4 6000 10000 8000.0
As you see it calculates the rowwise mean per row, according to STUFF and STUFF2
@GregF Yep....ungroup()
was the key. Thanks.
Working code
RESULT = group_by(DATA, SITE, DATE) %>%
mutate(STUFF = sample(STUFF,replace= TRUE),
STUFF2 = sample(STUFF2,replace= TRUE)) %>%
ungroup() %>%
mutate(NAYSA = rowMeans(.[,-1:-2]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With