Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching multiple date values in R

Tags:

matching

r

plyr

I have the following dataframe DF describing people that have worked on a project on certain dates:

ID    ProjectName    StartDate 
1       Health        3/1/06 18:20
2       Education     2/1/07 15:30
1       Education     5/3/09 9:00
3       Wellness      4/1/10 12:00
2       Health        6/1/11 14:20

The goal is to find the first project corresponding to each ID. For example the expected output would be as follows:

ID    ProjectName    StartDate 
1       Health        3/1/06 18:20
2       Education     2/1/07 15:30
3       Wellness      4/1/10 12:00

So far I have done the following to get the first StartDate for each ID:

sub <- ddply(DF, .(ID), summarise, st = min(as.POSIXct(StartDate)));

After this, I need to match each row in sub with the original DF and extract the projects corresponding to that ID and StartDate. This can be done in a loop for each row in sub. However, my dataset is very large and I would like to know if there is an efficient way to do this matching and extract this subset from DF.

like image 959
user2327621 Avatar asked Dec 03 '22 23:12

user2327621


1 Answers

Here's a data.table solution, which ought to be pretty efficient.

DF <- data.frame(ID=c(1,2,1,3,2,1), ProjectName=c('Health', 'Education', 'Education', 'Wellness', 'Health', 'Health'),
             StartDate=c('3/1/06 18:20', '2/1/07 15:30', '5/3/09 9:00', '4/1/10 12:00', '6/1/11 14:20', '1/1/06 11:10'))

Note that I've modified your data, adding another element at the end, so the dates are no longer sorted. Thus the output differs.

d <- as.data.table(DF)

# Order by StartDate and take the first ID.
# Assumes that your dates are month/day/year.

d[order(as.POSIXct(StartDate, format="%m/%d/%y %H:%M"))][,.SD[1,],by=ID]
##    ID ProjectName    StartDate
## 1:  1      Health 1/1/06 11:10
## 2:  2   Education 2/1/07 15:30
## 3:  3    Wellness 4/1/10 12:00

If your dates are already sorted (as in your example), this suffices:

d[,.SD[1,],by=ID]
like image 114
Matthew Lundberg Avatar answered Dec 25 '22 07:12

Matthew Lundberg