Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter rows below a given row element within a group

Tags:

r

I have a dataframe with a set of dated events in each row linked to a location. Within each location I have an index event and a series of various matched events that may have happened before and/or after the index event. I need to subset all matched events that happened before the index event for each location. The data structure looks like this.

locid    match      date          score      iid
1        index      4/11/2013      15        1
1        matched    1/09/2013      23        2
1        matched    14/04/2013      1        3
1        matched    7/1/2014       21        4
2        index      2/4/2013       12        1
2        matched    1/2/2013       10        2
3        index      1/5/2013       23        1
3        matched    2/5/2013       10        2
4        index      3/3/2013        9        1
4        matched    10/2/2013      32        2
4        matched    1/10/2012      15        3
4        matched    4/3/2013       12        4
4        matched    10/3/2013      10        5

And I need to subset the dataframe so that I end up only with the rows with a date below the date of the index event for each location:

locid    match      date          score      iid
1        matched    1/09/2013      23        2
1        matched    14/04/2013      1        3
2        matched    1/2/2013       10        2
4        matched    10/2/2013      32        2
4        matched    1/10/2012      15        3

First time I ask here, so I'm hoping I'm not doing this the wrong way. I tried various permutations of solutions within R, but I'm struggling to find the right one.

like image 421
Juanjo Medina Avatar asked Feb 12 '23 15:02

Juanjo Medina


1 Answers

Here's a data.table possibility (assuming your data called df)

library(data.table)
setDT(df)[, date := as.Date(date, format = "%d/%m/%Y")][, 
           .SD[date < date[match == "index"]], by = locid]
#    locid   match       date score iid
# 1:     1 matched 2013-09-01    23   2
# 2:     1 matched 2013-04-14     1   3
# 3:     2 matched 2013-02-01    10   2
# 4:     4 matched 2013-02-10    32   2
# 5:     4 matched 2012-10-01    15   3

Possible base R solution

df <- transform(df, date = as.Date(date, format = "%d/%m/%Y"))
do.call(rbind, by(df, df$locid, FUN = function(x) x[with(x, date < date[match == "index"]), ]))
#      locid   match       date score iid
# 1.2      1 matched 2013-09-01    23   2
# 1.3      1 matched 2013-04-14     1   3
# 2        2 matched 2013-02-01    10   2
# 4.10     4 matched 2013-02-10    32   2
# 4.11     4 matched 2012-10-01    15   3

And another possible base R solution

df <- transform(df, date = as.Date(date, format = "%d/%m/%Y"))
do.call(rbind, lapply(split(df, df$locid), function(x) x[with(x, date < date[match == "index"]), ]))
#      locid   match       date score iid
# 1.2      1 matched 2013-09-01    23   2
# 1.3      1 matched 2013-04-14     1   3
# 2        2 matched 2013-02-01    10   2
# 4.10     4 matched 2013-02-10    32   2
# 4.11     4 matched 2012-10-01    15   3

The basic idea here is to convert your date column to Date class so R will able to identify it's order. Afterwards, we basically split the data by locid and apply a filtering function on each chunk which selects only dates that comes before the date where match == index

like image 102
David Arenburg Avatar answered Feb 22 '23 21:02

David Arenburg