Summing rows based on specific factor combinations

Tags:

This is probably a silly question, but I have read through Crawley's chapter on dataframes and scoured the internet and haven't yet been able to make anything work.

Here is a sample dataset similar to mine:

> data<-data.frame(site=c("A","A","A","A","B","B"), plant=c("buttercup","buttercup",
"buttercup","rose","buttercup","rose"), treatment=c(1,1,2,1,1,1), 
plant_numb=c(1,1,2,1,1,2), fruits=c(1,2,1,4,3,2),seeds=c(45,67,32,43,13,25))
> data
  site     plant treatment plant_numb fruits seeds
1    A buttercup         1          1      1    45
2    A buttercup         1          1      2    67
3    A buttercup         2          2      1    32
4    A      rose         1          1      4    43
5    B buttercup         1          1      3    13
6    B      rose         1          2      2    25

What I would like to do is create a scenario where "seeds" and "fruits" are summed whenever unique site & plant & treatment & plant_numb combinations exist. Ideally, this would result in a reduction of rows, but a preservation of the original columns (ie I need the above example to look like this:)

  site     plant treatment plant_numb fruits seeds
1    A buttercup         1          1      3   112
2    A buttercup         2          2      1    32
3    A      rose         1          1      4    43
4    B buttercup         1          1      3    13
5    B      rose         1          2      2    25

This example is pretty basic (my dataset is ~5000 rows), and although here you only see two rows that are required to be summed, the numbers of rows that need to be summed vary, and range from 1 to ~45.

I've tried rowsum() and tapply() with pretty dismal results so far (the errors are telling me that these functions are not meaningful for factors), so if you could even point me in the right direction, I would greatly appreciate it!

Thanks so much!

667

asked May 03 '12 03:05

user1371443

1 Answers

Hopefully the following code is fairly self-explanatory. It uses the base function "aggregate" and basically this is saying for each unique combination of site, plant, treatment, and plant_num look at the sum of fruits and the sum of seeds.

# Load your data
data <- data.frame(site=c("A","A","A","A","B","B"), plant=c("buttercup","buttercup",
"buttercup","rose","buttercup","rose"), treatment=c(1,1,2,1,1,1), 
plant_numb=c(1,1,2,1,1,2), fruits=c(1,2,1,4,3,2),seeds=c(45,67,32,43,13,25)) 

# Summarize your data
aggregate(cbind(fruits, seeds) ~ 
      site + plant + treatment + plant_numb, 
      sum, 
      data = data)
#  site     plant treatment plant_numb fruits seeds
#1    A buttercup         1          1      3   112
#2    B buttercup         1          1      3    13
#3    A      rose         1          1      4    43
#4    B      rose         1          2      2    25
#5    A buttercup         2          2      1    32

The order of the rows changes (and it sorted by site, plant, ...) but hopefully that isn't too much of a concern.

An alternative way to do this would be to use ddply from the plyr package.

library(plyr)
ddply(data, .(site, plant, treatment, plant_numb), 
      summarize, 
      fruits = sum(fruits), 
      seeds = sum(seeds))
#  site     plant treatment plant_numb fruits seeds
#1    A buttercup         1          1      3   112
#2    A buttercup         2          2      1    32
#3    A      rose         1          1      4    43
#4    B buttercup         1          1      3    13
#5    B      rose         1          2      2    25

answered Oct 27 '22 16:10

Dason

Related questions
                            
                                Dependency package "package_name" not available
                            
                                Add ylab to ggplot with fivethirtyeight ggtheme
                            
                                dynamic ggplot layers in shiny with nearPoints()
                            
                                Principal component analysis (PCA) of time series data: spatial and temporal pattern
                            
                                Why does is.na() change its argument?
                            
                                How to suppress automatic figure numbering in Rmarkdown / pandoc
                            
                                How to filter on partial match using sparklyr
                            
                                How to specify the size of a graph in ggplot2 independent of axis labels
                            
                                Change color of error messages in RMarkdown code output (HTML, PDF)
                            
                                Pipe operator %>% error with seq() function in R
                            
                                dplyr: Use a custom function in summarize() after group_by()
                            
                                in R dplyr why do I need to ungroup() after I count()?
                            
                                RStudio not finding RTools
                            
                                Equivalent of `break` in purrr::map
                            
                                geom_path() refuses to cross over the 0/360 line in coord_polar()
                            
                                information on .o files for x64 is not available: NOTE on R package checks using Rcpp
                            
                                Manual annotate a ggplot with different labels, in different facets
                            
                                Sending a string from R to C++
                            
                                Reading user input without echoing
                            
                                How to handle binary strings in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Summing rows based on specific factor combinations

Tags:

r

data.table

plyr

user1371443

People also ask

1 Answers

Dason

Recent Activity

Donate For Us