Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter grouped data by condition defined by changes in column values using R

Below is a data frame that is grouped by the id variable. Variable x takes on values from 0 to 2.

id<- c(1,1,1,2,2,3,3,4,4,4,4,4,4)
x <- c(2,2,0,0,0,1,2,1,1,1,0,2,1)
df <- data.frame(id, x)

For each id, I wish to filter out rows if the value of x changes. For example, for id 1, the value of x changes on the third row, so this row will be discarded. Similarly, for id 4, the value of x changed from the fourth row, so all rows from the fourth row will be filtered out. Where there is no change in x for an id, we keep the data as it is. Here is the expected output

  id x
1  1 2
2  1 2
3  2 0
4  2 0
5  3 1
6  4 1
7  4 1
8  4 1 

As shown below, I am familiar with the use of group_by and filter functions in filtering grouped data of this kind. However, in this particular case, I am not sure about how to code inside the filter function to filter out data based on the condition described above. Any help is greatly appreciated.

df <- df %>%
      group_by(id) %>%
      filter()
like image 769
T Richard Avatar asked May 03 '26 08:05

T Richard


2 Answers

This is the perfect place to use dplyr::consecutive_id to select the first unique value!

df%>%
  group_by(id)%>%
  filter(consecutive_id(x)==1)

# A tibble: 8 × 2
# Groups:   id [4]
     id     x
  <dbl> <dbl>
1     1     2
2     1     2
3     2     0
4     2     0
5     3     1
6     4     1
7     4     1
8     4     1

Or more succinctly,

df%>%
  filter(consecutive_id(x)==1, .by=id)
like image 81
one Avatar answered May 05 '26 21:05

one


One way would be to count the number of times x has changed from the prior row within each id, and only keep the first string of values.

df |> filter(cumsum(x != lag(x, 1, Inf)) == 1, .by = id)

  id x
1  1 2
2  1 2
3  2 0
4  2 0
5  3 1
6  4 1
7  4 1
8  4 1
like image 38
Jon Spring Avatar answered May 05 '26 22:05

Jon Spring