Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum rows based on multiple conditions - R? [duplicate]

I have a dataframe that contains a plot ID (plotID), tree species code (species), and a cover value (cover). You can see there are multiple records of tree species within one of the plots. How can I sum the "cover" field if there are duplicate "species" rows within each plot?

For example, here is some sample data:

# Sample Data
plotID = c( "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200046012040",
       "SUF200046012040", "SUF200046012040", "SUF200046012040", "SUF200046012040", "SUF200046012040", "SUF200046012040")
species = c("ABBA",  "BEPA",  "PIBA2", "PIMA",  "PIRE",  "PIBA2", "PIBA2", "PIMA",  "PIMA",  "PIRE",  "POTR5", "POTR5")
cover = c(26.893939,  5.681818,  9.469697, 16.287879,  1.893939, 16.287879,  4.166667, 10.984848, 16.666667, 11.363636, 18.181818,
          13.257576)
df_original = data.frame(plotID, species, cover)

enter image description here

And here is the intended output:

# Intended Output
plotID2 = c( "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200046012040",
            "SUF200046012040", "SUF200046012040", "SUF200046012040")
species2 = c("ABBA",  "BEPA",  "PIBA2", "PIMA",  "PIRE",  "PIBA2", "PIMA",  "PIRE",  "POTR5")
cover2 = c(26.893939,  5.681818,  9.469697, 16.287879,  1.893939, 20.454546, 18.651515, 11.363636, 31.439394)
df_intended_output = data.frame(plotID2, species2, cover2)

enter image description here

like image 661
Borealis Avatar asked Nov 29 '22 23:11

Borealis


1 Answers

Easy with aggregate

aggregate(cover~species+plotID, data=df_original, FUN=sum) 

Easier with data.table

as.data.table(df_original)[, sum(cover), by = .(plotID, species)]
like image 105
ExperimenteR Avatar answered Dec 10 '22 09:12

ExperimenteR