Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging data frames without duplicating rows

Tags:

dataframe

r

I would like to merge two data frames, but do not want to duplicate rows if there is more than one match. Instead I would like to sum the observations on that day.

From ?merge: The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.

Here's some example code:

days <- as.data.frame(as.Date(c("2012-1-1", "2012-1-2", "2012-1-3", "2012-1-4")))

names(days) <- "Date"
obs.days <- as.data.frame(as.Date(c("2012-1-2", "2012-1-3", "2012-1-3")))
obs.days$count <- 1
colnames(obs.days) <- c("Date", "Count")
df <- merge(days, obs.days, by.x="Date", by.y="Date", all.x=TRUE)

I would like the final data frame to only list 2012-1-3 one time with a count value of 2.

like image 471
Boom Shakalaka Avatar asked Jan 12 '12 01:01

Boom Shakalaka


1 Answers

I'd suggest you merge them and then aggregate them (essentially perform a SUM for each unique Date).

df <- merge(z.days,obs.days, by.x="Date", by.y="Date", all.x=TRUE)
        Date Count
1 2012-01-01    NA
2 2012-01-02     1
3 2012-01-03     1
4 2012-01-03     1
5 2012-01-04    NA

Now to do the merge you could use aggregate:

df2 <- aggregate(df$Count,list(df$Date),sum)
     Group.1  x
1 2012-01-01 NA
2 2012-01-02  1
3 2012-01-03  2
4 2012-01-04 NA
names(df2)<-names(df)

BUT I'd recommend package plyr, which is awesome! In particular, function ddply.

library(plyr)
ddply(df,.(Date),function(x) data.frame(Date=x$Date[1],Count=sum(x$Count)))
        Date Count
1 2012-01-01    NA
2 2012-01-02     1
3 2012-01-03     2
4 2012-01-04    NA

The command ddply(df,.(Date),FUN) essentially does:

for each date in unique(df$Date):
    add to output dataframe FUN( df[df$Date==date,] )

So the function I've provided creates a data frame of one row with columns Date and Count, being the sum of all counts for that date.

like image 55
mathematical.coffee Avatar answered Sep 28 '22 09:09

mathematical.coffee