Below is a data frame that is grouped by the id variable. Variable x takes on values from 0 to 2.
id<- c(1,1,1,2,2,3,3,4,4,4,4,4,4)
x <- c(2,2,0,0,0,1,2,1,1,1,0,2,1)
df <- data.frame(id, x)
For each id, I wish to filter out rows if the value of x changes. For example, for id 1, the value of x changes on the third row, so this row will be discarded. Similarly, for id 4, the value of x changed from the fourth row, so all rows from the fourth row will be filtered out. Where there is no change in x for an id, we keep the data as it is. Here is the expected output
id x
1 1 2
2 1 2
3 2 0
4 2 0
5 3 1
6 4 1
7 4 1
8 4 1
As shown below, I am familiar with the use of group_by and filter functions in filtering grouped data of this kind. However, in this particular case, I am not sure about how to code inside the filter function to filter out data based on the condition described above. Any help is greatly appreciated.
df <- df %>%
group_by(id) %>%
filter()
This is the perfect place to use dplyr::consecutive_id to select the first unique value!
df%>%
group_by(id)%>%
filter(consecutive_id(x)==1)
# A tibble: 8 × 2
# Groups: id [4]
id x
<dbl> <dbl>
1 1 2
2 1 2
3 2 0
4 2 0
5 3 1
6 4 1
7 4 1
8 4 1
Or more succinctly,
df%>%
filter(consecutive_id(x)==1, .by=id)
One way would be to count the number of times x has changed from the prior row within each id, and only keep the first string of values.
df |> filter(cumsum(x != lag(x, 1, Inf)) == 1, .by = id)
id x
1 1 2
2 1 2
3 2 0
4 2 0
5 3 1
6 4 1
7 4 1
8 4 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With