Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing rows based on specific factor combinations

Tags:

r

data.table

plyr

This is probably a silly question, but I have read through Crawley's chapter on dataframes and scoured the internet and haven't yet been able to make anything work.

Here is a sample dataset similar to mine:

> data<-data.frame(site=c("A","A","A","A","B","B"), plant=c("buttercup","buttercup",
"buttercup","rose","buttercup","rose"), treatment=c(1,1,2,1,1,1), 
plant_numb=c(1,1,2,1,1,2), fruits=c(1,2,1,4,3,2),seeds=c(45,67,32,43,13,25))
> data
  site     plant treatment plant_numb fruits seeds
1    A buttercup         1          1      1    45
2    A buttercup         1          1      2    67
3    A buttercup         2          2      1    32
4    A      rose         1          1      4    43
5    B buttercup         1          1      3    13
6    B      rose         1          2      2    25  

What I would like to do is create a scenario where "seeds" and "fruits" are summed whenever unique site & plant & treatment & plant_numb combinations exist. Ideally, this would result in a reduction of rows, but a preservation of the original columns (ie I need the above example to look like this:)

  site     plant treatment plant_numb fruits seeds
1    A buttercup         1          1      3   112
2    A buttercup         2          2      1    32
3    A      rose         1          1      4    43
4    B buttercup         1          1      3    13
5    B      rose         1          2      2    25

This example is pretty basic (my dataset is ~5000 rows), and although here you only see two rows that are required to be summed, the numbers of rows that need to be summed vary, and range from 1 to ~45.

I've tried rowsum() and tapply() with pretty dismal results so far (the errors are telling me that these functions are not meaningful for factors), so if you could even point me in the right direction, I would greatly appreciate it!

Thanks so much!

like image 667
user1371443 Avatar asked May 03 '12 03:05

user1371443


People also ask

How to sum based on column and row criteria in Excel?

Using SUMIFS to Sum Based on Column and Row Criteria SUMIFS is the sub-category of SUMIF function which adds the cells specified by a given set of conditions or criteria & we can use this function to add multiple criteria in a single function. We don’t need to type two different functions to sum in the function bar.

How do you sum specific rows in R with example?

How to Sum Specific Rows in R (With Examples) We can use the following syntax to sum specific rows of a data frame in R: with(df, sum(column_1 [column_2 == 'some value'])) This syntax finds the sum of the rows in column 1 in which column 2 is equal to some value, where the data frame is called df.

How to sum data from two columns in excel if (b1 J1)?

=SUM(IF(B1:J1="Feb",IF(A2:A7="Tom",B2:J7))) And then press Shift + Ctrl + Enter keys together to get the result, see screenshot: Note: In the above formulas: Tom and Feb are the column and row criteria that based on, A2:A7, B1:J1 are the column headers and row headers contain the criteria, B2:J7 is the data range that you want to sum.

What is the formula to sum numbers in a column?

=SUMIFS is an arithmetic formula. It calculates numbers, which in this case are in column D. The first step is to specify the location of the numbers: In other words, you want the formula to sum numbers in that column if they meet the conditions.


1 Answers

Hopefully the following code is fairly self-explanatory. It uses the base function "aggregate" and basically this is saying for each unique combination of site, plant, treatment, and plant_num look at the sum of fruits and the sum of seeds.

# Load your data
data <- data.frame(site=c("A","A","A","A","B","B"), plant=c("buttercup","buttercup",
"buttercup","rose","buttercup","rose"), treatment=c(1,1,2,1,1,1), 
plant_numb=c(1,1,2,1,1,2), fruits=c(1,2,1,4,3,2),seeds=c(45,67,32,43,13,25)) 

# Summarize your data
aggregate(cbind(fruits, seeds) ~ 
      site + plant + treatment + plant_numb, 
      sum, 
      data = data)
#  site     plant treatment plant_numb fruits seeds
#1    A buttercup         1          1      3   112
#2    B buttercup         1          1      3    13
#3    A      rose         1          1      4    43
#4    B      rose         1          2      2    25
#5    A buttercup         2          2      1    32

The order of the rows changes (and it sorted by site, plant, ...) but hopefully that isn't too much of a concern.

An alternative way to do this would be to use ddply from the plyr package.

library(plyr)
ddply(data, .(site, plant, treatment, plant_numb), 
      summarize, 
      fruits = sum(fruits), 
      seeds = sum(seeds))
#  site     plant treatment plant_numb fruits seeds
#1    A buttercup         1          1      3   112
#2    A buttercup         2          2      1    32
#3    A      rose         1          1      4    43
#4    B buttercup         1          1      3    13
#5    B      rose         1          2      2    25
like image 60
Dason Avatar answered Oct 27 '22 16:10

Dason