Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consolidate duplicate rows

Tags:

r

I have a data frame where one column is species' names, and the second column is abundance values. Due to the sampling procedure, some species appear more than once (i.e., there is more than one row with Species X in it). I would like to consolidate those entries and sum their abundances.

For example, given this data frame:

set.seed(6) df=data.frame(   x=c("sp1","sp2","sp3","sp3","sp4","sp2","sp3"),   y=rpois(7,2)); df 

which produces:

    x y 1 sp1 2 2 sp2 4 3 sp3 1 4 sp3 1 5 sp4 3 6 sp2 5 7 sp3 5 

I would like to instead produce:

    x y 1 sp1 2     2 sp2 9     (5+4) 3 sp3 7     (5+1+1) 5 sp4 3 

Thanks in advance for any help you can provide!

like image 841
jslefche Avatar asked Apr 16 '12 19:04

jslefche


People also ask

How do I consolidate similar data in Excel?

Click Data>Consolidate (in the Data Tools group). In the Function box, click the summary function that you want Excel to use to consolidate the data. The default function is SUM. Select your data.


2 Answers

This works:

library(plyr) ddply(df,"x",numcolwise(sum)) 

in words: (1) split the data frame df by the "x" column; (2) for each chunk, take the sum of each numeric-valued column; (3) stick the results back into a single data frame. (dd in ddply stands for "take a d ata frame as input, return a d ata frame")

Another, possibly clearer, approach:

aggregate(y~x,data=df,FUN=sum) 

See quick/elegant way to construct mean/variance summary table for a related (slightly more complex) question.

like image 175
Ben Bolker Avatar answered Oct 26 '22 23:10

Ben Bolker


Simple as aggregate:

aggregate(df['y'], by=df['x'], sum) 
like image 40
Joshua Ulrich Avatar answered Oct 27 '22 00:10

Joshua Ulrich