Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter between a Start point and Stop point

Tags:

r

dplyr

I have a dataset that looks like the following:

ID  Cond    Time1   Time2
1   2       Start   Stop1
1   3       Start   abc
1   1       abc     Stop2
1   2       Start   abc
1   2       abc     Stop1
2   2       Start   abc
2   4       abc     jkl
2   3       abc     jkl
2   2       abc     jkl
2   3       abc     Stop2
3   2       Start   abc
3   3       abc     Stop2
3   2       Start   Stop1
3   3       Start   Stop1
3   3       Start   abc
3   2       abc     jkl
3   4       baba    Stop1
4   2       Start   Stop2
4   1       Start   asd
4   2       abc     Stop2

And I need to filter the data based on a couple of criteria. If Cond = 2 and Time1 = Start, and I need to filter until the first stopping point (either Stop1 or Stop2). Essentially, it should look like this:

ID  Cond    Time1   Time2
1   2       Start   Stop1
1   2       Start   abc
1   2       abc     Stop1
2   2       Start   abc
2   4       abc     jkl
2   3       abc     jkl
2   2       abc     jkl
2   3       abc     Stop2
3   2       Start   abc
3   3       abc     Stop2
3   2       Start   Stop1
4   2       Start   Stop2

Also, the real dataset has over 140,000 observations, so efficienicy is key. I was thinking about using the dplyr package, but not sure how to go about this problem.

like image 553
akash87 Avatar asked Mar 23 '26 23:03

akash87


1 Answers

Using dplyr

dframe = read.table(header = T, text = "ID  Cond    Time1   Time2
1   2       Start   Stop1
                    1   3       Start   abc
                    1   1       abc     Stop2
                    1   2       Start   abc
                    1   2       abc     Stop1
                    2   2       Start   abc
                    2   4       abc     jkl
                    2   3       abc     jkl
                    2   2       abc     jkl
                    2   3       abc     Stop2
                    3   2       Start   abc
                    3   3       abc     Stop2
                    3   2       Start   Stop1
                    3   3       Start   Stop1
                    3   3       Start   abc
                    3   2       abc     jkl
                    3   4       baba    Stop1
                    4   2       Start   Stop2
                    4   1       Start   asd
                    4   2       abc     Stop2")

library(dplyr)

# add index
dframe = data.frame(index = 1:nrow(dframe), dframe)
head(dframe)

# get starting points
start_points = dframe %>%
  filter(Cond == 2 & Time1 == 'Start') %>%
  select(index, ID)

# get stopping points
stop_points = dframe %>%
  filter(substr(Time2, 1, 4) == 'Stop') %>%
  select(index, ID)

# get the stopping point associated with each start point
start_stop = start_points %>%
  left_join(stop_points, by = "ID") %>%
  filter(index.x <= index.y) %>%
  group_by(ID, index.x) %>%
  summarise(index.y = min(index.y)) %>%
  ungroup() %>%
  rename(start_index = index.x, stop_index = index.y)

# add rows between
result = start_stop %>%
  left_join(dframe, by = "ID") %>%
  filter(start_index <= index, index <= stop_index) %>%
  select(-c(start_index, stop_index, index))

> result
Source: local data frame [12 x 4]

ID  Cond  Time1  Time2
(int) (int) (fctr) (fctr)
1      1     2  Start  Stop1
2      1     2  Start    abc
3      1     2    abc  Stop1
4      2     2  Start    abc
5      2     4    abc    jkl
6      2     3    abc    jkl
7      2     2    abc    jkl
8      2     3    abc  Stop2
9      3     2  Start    abc
10     3     3    abc  Stop2
11     3     2  Start  Stop1
12     4     2  Start  Stop2
like image 186
mbiron Avatar answered Mar 25 '26 11:03

mbiron



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!