Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate multiple rows of the same data.frame in R based on common values in given columns

I have a data.frame that looks like this:

# set example data
df <- read.table(textConnection("item\tsize\tweight\tvalue
A\t2\t3\t4
A\t2\t3\t6
B\t1\t2\t3
C\t3\t2\t1
B\t1\t2\t4
B\t1\t2\t2"), header = TRUE)

# print example data
df
  item size weight value
1    A    2      3     4
2    A    2      3     6
3    B    1      2     3
4    C    3      2     1
5    B    1      2     4
6    B    1      2     2

As you can see the size and weight columns do not add any complexity since they are the same for each item. However, there can be multiple values for the same item.

I want to collapse the data.frame to have one row per item using the mean value:

  item size weight value
1    A    2      3     5
3    B    1      2     3
4    C    3      2     1

I guess I have to use the aggregate function but I could not figure out how exactly I can get the above result.

like image 344
mschilli Avatar asked Aug 14 '13 09:08

mschilli


People also ask

How do I combine rows in the same Dataframe in R?

To merge two data frames (datasets) horizontally, use the merge() function in the R language. To bind or combine rows in R, use the rbind() function. The rbind() stands for row binding.

How do you aggregate a dataset in R?

The process involves two stages. First, collate individual cases of raw data together with a grouping variable. Second, perform which calculation you want on each group of cases.

How do I find the common between two columns in R?

Data Visualization using R Programming To find the common elements between two columns of an R data frame, we can use intersect function.


4 Answers

Nowadays, this is what I would do:

library(dplyr)

df %>%
  group_by(item, size, weight) %>%
  summarize(value = mean(value)) %>%
  ungroup

This yields the following result:

# A tibble: 3 x 4
   item  size weight value
  <chr> <int>  <int> <dbl>
1     A     2      3     5
2     B     1      2     3
3     C     3      2     1

I will leave the accepted answer as such as I specifically asked for aggregate, but I find the dplyr solution the most readable.

like image 88
mschilli Avatar answered Oct 14 '22 14:10

mschilli


Here is the solution using the ddply from plyr package:

library(plyr)
ddply(df,.(item),colwise(mean))
  item size weight value
1    A    2      3     5
2    B    1      2     3
3    C    3      2     1
like image 42
Metrics Avatar answered Oct 18 '22 21:10

Metrics


aggregate(value ~ item + size + weight, FUN = mean, data=df)

  item size weight value
1    B    1      2     3
2    C    3      2     1
3    A    2      3     5
like image 17
Mark Miller Avatar answered Oct 18 '22 22:10

Mark Miller


df$value <- ave(df$value,df$item,FUN=mean)
df[!duplicated(df$item),]

  item size weight value
1    A    2      3     5
3    B    1      2     3
4    C    3      2     1
like image 3
Thomas Avatar answered Oct 18 '22 20:10

Thomas