I have grouped data in R using the aggregate method.
Avg=aggregate(x$a, by=list(x$b,x$c),FUN= mean)
This gives me the mean for all the values of 'a' grouped by 'b' and 'c' of data frame 'x'.
Now instead of taking the average of all values of 'a' I want to take the average of 3 maximum values of 'a' grouped by 'b' and 'c' .
Sample data set
a b c
10 G 3
20 G 3
22 G 3
10 G 3
15 G 3
25 G 3
30 G 3
After above Aggregate function it will give me
Group.1 Group.2 x
G 3 18.85
But I want to take just the maximum 5 values of 'a' for average
Group.1 Group.2 x
G 3 22.40
I am not able to accommodate the below maximum function that i am using in the Agrregate function
index <- order(vector, decreasing = T)[1:5]
vector(index)
Can please anyone throw some light on how is this possible ?
The process involves two stages. First, collate individual cases of raw data together with a grouping variable. Second, perform which calculation you want on each group of cases.
In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .
You can order the data, get the top 5 entries (using head) and then apply the mean:
aggregate(x$a, by=list(x$b,x$c),FUN= function(x) mean(head(x[order(-x)], 5)))
# Group.1 Group.2 x
#1 G 3 22.4
If you want to do this with a custom function, I would do it like this:
myfunc <- function(vec, n){
mean(head(vec[order(-vec)], n))
}
aggregate(x$a, by=list(x$b,x$c),FUN= function(z) myfunc(z, 5))
# Group.1 Group.2 x
#1 G 3 22.4
I actually prefer using the formula style in aggregate
which would look like this (I also use with()
to be able to refer to the column names directly without using x$
each time):
with(x, aggregate(a ~ b + c, FUN= function(z) myfunc(z, 5)))
# b c a
#1 G 3 22.4
In this function, the parameter z
is passed each a
-vector based on groups of b
and c
. Does that make more sense now? Also note that it doesn't return an integer here but a numeric (decimal, 22.4 in this case) value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With