I have a data-frame likeso: <pre class="prettyprint"><code>x <- id1 id2 val1 val2 val3 val4 1 a x 1 9 2 a x 2 4 3 a y 3 5 4 a y 4 9 5 b x 1 7 6 b y 4 4 7 b x 3 9 8 b y 2 8 </code></pre> I wish to aggregate the above by id1 & id2. I want to be able to get the means for val1, val2, val3, val4 at the same time. How do i do this? This is what i currently have but it works just for 1 column: <pre class="prettyprint"><code>agg <- aggregate(x$val1, list(id11 = x$id1, id2= x$id2), mean) names(agg)[3] <- c("val1") # Rename the column </code></pre> Also, how do i rename the columns which are outputted as means in the same statement given above

We can use the formula method of <code>aggregate</code>. The variables on the 'rhs' of <code>~</code> are the grouping variables while the <code>.</code> represents all other variables in the 'df1' (from the example, we assume that we need the <code>mean</code> for all the columns except the grouping), specify the dataset and the function (<code>mean</code>). <pre class="prettyprint"><code>aggregate(.~id1+id2, df1, mean) </code></pre> <hr> Or we can use <code>summarise_each</code> from <code>dplyr</code> after grouping (<code>group_by</code>) <pre class="prettyprint"><code>library(dplyr) df1 %>% group_by(id1, id2) %>% summarise_each(funs(mean)) </code></pre> Or using <code>summarise</code> with <code>across</code> (<code>dplyr</code> devel version - <code>‘0.8.99.9000’</code>) <pre class="prettyprint"><code>df1 %>% group_by(id1, id2) %>% summarise(across(starts_with('val'), mean)) </code></pre> <hr> Or another option is <code>data.table</code>. We convert the 'data.frame' to 'data.table' (<code>setDT(df1)</code>, grouped by 'id1' and 'id2', we loop through the subset of data.table (<code>.SD</code>) and get the <code>mean</code>. <pre class="prettyprint"><code>library(data.table) setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)] </code></pre> <h3>data</h3> <pre class="prettyprint"><code>df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", "b", "b" ), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), val1 = c(1L, 2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8")) </code></pre>

You could try: <pre class="prettyprint"><code>agg <- aggregate(list(x$val1, x$val2, x$val3, x$val4), by = list(x$id1, x$id2), mean) </code></pre>

Aggregate multiple columns at once [duplicate]

Tags:

r

aggregate

I have a data-frame likeso:

x <-
id1 id2    val1  val2 val3 val4
1   a   x    1    9
2   a   x    2    4
3   a   y    3    5
4   a   y    4    9
5   b   x    1    7
6   b   y    4    4
7   b   x    3    9
8   b   y    2    8

I wish to aggregate the above by id1 & id2. I want to be able to get the means for val1, val2, val3, val4 at the same time.

How do i do this?

This is what i currently have but it works just for 1 column:

agg <- aggregate(x$val1, list(id11 = x$id1, id2= x$id2), mean)
names(agg)[3] <- c("val1")  # Rename the column

Also, how do i rename the columns which are outputted as means in the same statement given above

298

asked Dec 30 '15 05:12

Rookie

2 Answers

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

library(dplyr)
df1 %>%
    group_by(id1, id2) %>% 
    summarise_each(funs(mean))

Or using summarise with across (dplyr devel version - ‘0.8.99.9000’)

df1 %>% 
    group_by(id1, id2) %>%
    summarise(across(starts_with('val'), mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), 
val1 = c(1L, 
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

answered Oct 20 '22 07:10

akrun

You could try:

agg <- aggregate(list(x$val1, x$val2, x$val3, x$val4), by = list(x$id1, x$id2), mean)

answered Oct 20 '22 09:10

Filipe Mencarini

Related questions
                            
                                Sort matrix according to first column in R
                            
                                Set R plots x axis to show at y=0
                            
                                Reading data from PDF files into R
                            
                                Solution. How to install_github when there is a proxy
                            
                                Extract matrix column values by matrix column name
                            
                                How to slice data from a middle index until the end without using `length` in R (like you can in python)?
                            
                                Adjust Transparency (alpha) of stat_smooth lines, not just transparency of Confidence Interval
                            
                                lambda-like functions in R?
                            
                                dplyr: How to use group_by inside a function?
                            
                                Fast vectorized merge of list of data.frames by row
                            
                                Looping over a Date or POSIXct object results in a numeric iterator
                            
                                How do I open a script file in RStudio using an R command?
                            
                                How to annotate() ggplot with latex
                            
                                Subset rows in a data frame based on a vector of values
                            
                                Fill and border colour in geom_point (scale_colour_manual) in ggplot
                            
                                Grouped bar plot in ggplot
                            
                                How can I count runs in a sequence?
                            
                                Replace values in a dataframe based on lookup table
                            
                                heatmap with values (ggplot2)
                            
                                Put whisker ends on boxplot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With