I would like to merge two data frames, but do not want to duplicate rows if there is more than one match. Instead I would like to sum the observations on that day.
From ?merge: The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches contribute one row each.
Here's some example code:
days <- as.data.frame(as.Date(c("2012-1-1", "2012-1-2", "2012-1-3", "2012-1-4")))
names(days) <- "Date"
obs.days <- as.data.frame(as.Date(c("2012-1-2", "2012-1-3", "2012-1-3")))
obs.days$count <- 1
colnames(obs.days) <- c("Date", "Count")
df <- merge(days, obs.days, by.x="Date", by.y="Date", all.x=TRUE)
I would like the final data frame to only list 2012-1-3 one time with a count value of 2.
I'd suggest you merge them and then aggregate them (essentially perform a SUM for each unique Date
).
df <- merge(z.days,obs.days, by.x="Date", by.y="Date", all.x=TRUE)
Date Count
1 2012-01-01 NA
2 2012-01-02 1
3 2012-01-03 1
4 2012-01-03 1
5 2012-01-04 NA
Now to do the merge you could use aggregate
:
df2 <- aggregate(df$Count,list(df$Date),sum)
Group.1 x
1 2012-01-01 NA
2 2012-01-02 1
3 2012-01-03 2
4 2012-01-04 NA
names(df2)<-names(df)
BUT I'd recommend package plyr
, which is awesome! In particular, function ddply
.
library(plyr)
ddply(df,.(Date),function(x) data.frame(Date=x$Date[1],Count=sum(x$Count)))
Date Count
1 2012-01-01 NA
2 2012-01-02 1
3 2012-01-03 2
4 2012-01-04 NA
The command ddply(df,.(Date),FUN)
essentially does:
for each date in unique(df$Date):
add to output dataframe FUN( df[df$Date==date,] )
So the function I've provided creates a data frame of one row with columns Date
and Count
, being the sum of all counts for that date.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With