I have grouped data in R using the aggregate method. <pre class="prettyprint"><code>Avg=aggregate(x$a, by=list(x$b,x$c),FUN= mean) </code></pre> This gives me the mean for all the values of 'a' grouped by 'b' and 'c' of data frame 'x'. Now instead of taking the average of all values of 'a' I want to take the average of 3 maximum values of 'a' grouped by 'b' and 'c' . Sample data set <pre class="prettyprint"><code>a b c 10 G 3 20 G 3 22 G 3 10 G 3 15 G 3 25 G 3 30 G 3 </code></pre> After above Aggregate function it will give me <pre class="prettyprint"><code>Group.1 Group.2 x G 3 18.85 </code></pre> But I want to take just the maximum 5 values of 'a' for average <pre class="prettyprint"><code>Group.1 Group.2 x G 3 22.40 </code></pre> I am not able to accommodate the below maximum function that i am using in the Agrregate function <pre class="prettyprint"><code>index <- order(vector, decreasing = T)[1:5] vector(index) </code></pre> Can please anyone throw some light on how is this possible ?

You can order the data, get the top 5 entries (using head) and then apply the mean: <pre class="prettyprint"><code>aggregate(x$a, by=list(x$b,x$c),FUN= function(x) mean(head(x[order(-x)], 5))) # Group.1 Group.2 x #1 G 3 22.4 </code></pre> If you want to do this with a custom function, I would do it like this: <pre class="prettyprint"><code>myfunc <- function(vec, n){ mean(head(vec[order(-vec)], n)) } aggregate(x$a, by=list(x$b,x$c),FUN= function(z) myfunc(z, 5)) # Group.1 Group.2 x #1 G 3 22.4 </code></pre> I actually prefer using the formula style in <code>aggregate</code> which would look like this (I also use <code>with()</code> to be able to refer to the column names directly without using <code>x$</code> each time): <pre class="prettyprint"><code>with(x, aggregate(a ~ b + c, FUN= function(z) myfunc(z, 5))) # b c a #1 G 3 22.4 </code></pre> In this function, the parameter <code>z</code> is passed each <code>a</code>-vector based on groups of <code>b</code> and <code>c</code>. Does that make more sense now? Also note that it doesn't return an integer here but a numeric (decimal, 22.4 in this case) value.

Aggregating Data in R with user defined function

Tags:

r

aggregate

I have grouped data in R using the aggregate method.

Avg=aggregate(x$a, by=list(x$b,x$c),FUN= mean)

This gives me the mean for all the values of 'a' grouped by 'b' and 'c' of data frame 'x'.

Now instead of taking the average of all values of 'a' I want to take the average of 3 maximum values of 'a' grouped by 'b' and 'c' .

Sample data set

a    b    c
10   G    3 
20   G    3 
22   G    3
10   G    3 
15   G    3
25   G    3
30   G    3

After above Aggregate function it will give me

Group.1    Group.2    x
  G          3       18.85

But I want to take just the maximum 5 values of 'a' for average

Group.1    Group.2    x
  G          3       22.40

I am not able to accommodate the below maximum function that i am using in the Agrregate function

index <- order(vector, decreasing = T)[1:5]
vector(index)

Can please anyone throw some light on how is this possible ?

216

asked Aug 21 '14 16:08

user3812709

1 Answers

You can order the data, get the top 5 entries (using head) and then apply the mean:

aggregate(x$a, by=list(x$b,x$c),FUN= function(x) mean(head(x[order(-x)], 5)))
#  Group.1 Group.2    x
#1       G       3 22.4

If you want to do this with a custom function, I would do it like this:

myfunc <- function(vec, n){
  mean(head(vec[order(-vec)], n))
}

aggregate(x$a, by=list(x$b,x$c),FUN= function(z) myfunc(z, 5))
#  Group.1 Group.2    x
#1       G       3 22.4

I actually prefer using the formula style in aggregate which would look like this (I also use with() to be able to refer to the column names directly without using x$ each time):

with(x, aggregate(a ~ b + c, FUN= function(z) myfunc(z, 5)))
#  b c    a
#1 G 3 22.4

In this function, the parameter z is passed each a-vector based on groups of b and c. Does that make more sense now? Also note that it doesn't return an integer here but a numeric (decimal, 22.4 in this case) value.

answered Nov 03 '22 12:11

talat

Related questions
                            
                                Issue with complete.cases: invalid 'type' (list) of argument
                            
                                Interpolation within Groups
                            
                                Why doesn't Inkscape correctly read PDF files generated by R?
                            
                                plotting SpatialPointsDataFrame over a SpatialPolygonsDataFrame
                            
                                Box and line in R plot Legend
                            
                                How to add different text to each panel in lattice
                            
                                Sourcing external R scripts that rely on a variable set in the master/main Shiny document
                            
                                Implementing in-place modification algorithms in R
                            
                                R foreach not using multiple cores
                            
                                Model is empty, SVM in e1071 package
                            
                                Matching georeferenced data with shape file in R
                            
                                unscale and uncenter glmer parameters
                            
                                How to fix nodes when plotting a subset over a complete network using igraph R
                            
                                How do I prevent R from coercing this vector of Dates to numeric?
                            
                                r - ggplot2: connecting points in polar coordinates with a straight line
                            
                                R language: switch cases on numeric value
                            
                                Send function calls with different arguments to different processors in R using parallel package
                            
                                R: Compute a rolling sum on irregular time series grouped by id variables with time-based window
                            
                                R markdown v2 to pdf. Conversion error when non-Latin characters in plots
                            
                                Drawing minor ticks (not grid ticks) in ggplot2 in a date format axis

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With