I have a dataframe with a set of dated events in each row linked to a location. Within each location I have an index event and a series of various matched events that may have happened before and/or after the index event. I need to subset all matched events that happened before the index event for each location. The data structure looks like this.
locid match date score iid
1 index 4/11/2013 15 1
1 matched 1/09/2013 23 2
1 matched 14/04/2013 1 3
1 matched 7/1/2014 21 4
2 index 2/4/2013 12 1
2 matched 1/2/2013 10 2
3 index 1/5/2013 23 1
3 matched 2/5/2013 10 2
4 index 3/3/2013 9 1
4 matched 10/2/2013 32 2
4 matched 1/10/2012 15 3
4 matched 4/3/2013 12 4
4 matched 10/3/2013 10 5
And I need to subset the dataframe so that I end up only with the rows with a date below the date of the index event for each location:
locid match date score iid
1 matched 1/09/2013 23 2
1 matched 14/04/2013 1 3
2 matched 1/2/2013 10 2
4 matched 10/2/2013 32 2
4 matched 1/10/2012 15 3
First time I ask here, so I'm hoping I'm not doing this the wrong way. I tried various permutations of solutions within R, but I'm struggling to find the right one.
Here's a data.table
possibility (assuming your data called df
)
library(data.table)
setDT(df)[, date := as.Date(date, format = "%d/%m/%Y")][,
.SD[date < date[match == "index"]], by = locid]
# locid match date score iid
# 1: 1 matched 2013-09-01 23 2
# 2: 1 matched 2013-04-14 1 3
# 3: 2 matched 2013-02-01 10 2
# 4: 4 matched 2013-02-10 32 2
# 5: 4 matched 2012-10-01 15 3
Possible base R solution
df <- transform(df, date = as.Date(date, format = "%d/%m/%Y"))
do.call(rbind, by(df, df$locid, FUN = function(x) x[with(x, date < date[match == "index"]), ]))
# locid match date score iid
# 1.2 1 matched 2013-09-01 23 2
# 1.3 1 matched 2013-04-14 1 3
# 2 2 matched 2013-02-01 10 2
# 4.10 4 matched 2013-02-10 32 2
# 4.11 4 matched 2012-10-01 15 3
And another possible base R solution
df <- transform(df, date = as.Date(date, format = "%d/%m/%Y"))
do.call(rbind, lapply(split(df, df$locid), function(x) x[with(x, date < date[match == "index"]), ]))
# locid match date score iid
# 1.2 1 matched 2013-09-01 23 2
# 1.3 1 matched 2013-04-14 1 3
# 2 2 matched 2013-02-01 10 2
# 4.10 4 matched 2013-02-10 32 2
# 4.11 4 matched 2012-10-01 15 3
The basic idea here is to convert your date
column to Date
class so R will able to identify it's order. Afterwards, we basically split the data by locid
and apply a filtering function on each chunk which selects only dates that comes before the date
where match == index
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With