I have a data frame where one column is species' names, and the second column is abundance values. Due to the sampling procedure, some species appear more than once (i.e., there is more than one row with Species X in it). I would like to consolidate those entries and sum their abundances.
For example, given this data frame:
set.seed(6) df=data.frame( x=c("sp1","sp2","sp3","sp3","sp4","sp2","sp3"), y=rpois(7,2)); df
which produces:
x y 1 sp1 2 2 sp2 4 3 sp3 1 4 sp3 1 5 sp4 3 6 sp2 5 7 sp3 5
I would like to instead produce:
x y 1 sp1 2 2 sp2 9 (5+4) 3 sp3 7 (5+1+1) 5 sp4 3
Thanks in advance for any help you can provide!
Click Data>Consolidate (in the Data Tools group). In the Function box, click the summary function that you want Excel to use to consolidate the data. The default function is SUM. Select your data.
This works:
library(plyr) ddply(df,"x",numcolwise(sum))
in words: (1) split the data frame df
by the "x"
column; (2) for each chunk, take the sum of each numeric-valued column; (3) stick the results back into a single data frame. (dd
in ddply
stands for "take a d ata frame as input, return a d ata frame")
Another, possibly clearer, approach:
aggregate(y~x,data=df,FUN=sum)
See quick/elegant way to construct mean/variance summary table for a related (slightly more complex) question.
Simple as aggregate
:
aggregate(df['y'], by=df['x'], sum)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With