I have the following data frame:
id <- c(1,1,2,3,3)
date <- c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df <- data.frame(id,date)
df$date2 <- as.Date(as.character(df$date), format = "%d-%m-%y")
id date date2
1 23-01-08 2008-01-23
1 01-11-07 2007-11-01
2 30-11-07 2007-11-30
3 17-12-07 2007-12-17
3 12-12-08 2008-12-12
now I need to create a forth column and insert maximum date of transaction for each id
in that.
the final table should be as follow:
id date date2 max
1 23-01-08 2008-01-23 2008-01-23
1 01-11-07 2007-11-01 0
2 30-11-07 2007-11-30 2007-11-30
3 17-12-07 2007-12-17 0
3 12-12-08 2008-12-12 2008-12-12
I would be thankful if you could help me with this.
Adding dplyr
solution in case someone is looking:
library(dplyr)
df %>%
group_by(id) %>%
mutate(max = if_else(date2 == max(date2), date2, as.Date(NA)))
Result:
# A tibble: 5 x 4
# Groups: id [3]
id date date2 max
<dbl> <fctr> <date> <date>
1 1 23-01-08 2008-01-23 2008-01-23
2 1 01-11-07 2007-11-01 NA
3 2 30-11-07 2007-11-30 2007-11-30
4 3 17-12-07 2007-12-17 NA
5 3 12-12-08 2008-12-12 2008-12-12
Another approach is to use the plyr
package:
library(plyr)
ddply(df, "id", summarize, max = max(date2))
# id max
#1 1 2008-01-23
#2 2 2007-11-30
#3 3 2008-12-12
Now this isn't in the format you were after, as it only shows each id
once. Never fear, we can use transform
instead of summarize
:
ddply(df, "id", transform, max = max(date2))
# id date date2 max
#1 1 01-11-07 2007-11-01 2008-01-23
#2 1 23-01-08 2008-01-23 2008-01-23
#3 2 30-11-07 2007-11-30 2007-11-30
#4 3 12-12-08 2008-12-12 2008-12-12
#5 3 17-12-07 2007-12-17 2008-12-12
As in @seandavi's answer, this repeats the max
date for each id
. If you want to change the duplicates to NA
, something like this will do the job:
within(ddply(df, "id", transform, max = max(date2)), max[max != date2] <- NA)
id<-c(1,1,2,3,3)
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df<-data.frame(id,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")
# aggregate can be used for this type of thing
d = aggregate(df$date2,by=list(df$id),max)
# And merge the result of aggregate
# with the original data frame
df2 = merge(df,d,by.x=1,by.y=1)
df2
id date date2 x
1 1 23-01-08 2008-01-23 2008-01-23
2 1 01-11-07 2007-11-01 2008-01-23
3 2 30-11-07 2007-11-30 2007-11-30
4 3 17-12-07 2007-12-17 2008-12-12
5 3 12-12-08 2008-12-12 2008-12-12
Edit: Since you want the last column to be "empty" when the date does not match the max date, you can try the next line.
df2[df2[,3]!=df2[,4],4]=NA
df2
id date date2 x
1 1 23-01-08 2008-01-23 2008-01-23
2 1 01-11-07 2007-11-01 <NA>
3 2 30-11-07 2007-11-30 2007-11-30
4 3 17-12-07 2007-12-17 <NA>
5 3 12-12-08 2008-12-12 2008-12-12
Of course, it is always nice to clean up the colnames, etc., but I leave that for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With