Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove all rows after a certain point by group using dplyr?

Tags:

r

dplyr

I have a data frame:

test_df <- data.frame(
  x = c(rep("a", 5), rep("b", 5)), 
  y = c(1, 2, NA, 2, 3, NA, 1, 2, 3, 1)
)

I would like to remove all rows after y == 2 by the grouping information in column x. Is there a way to do it in dplyr?

My desired result is From:

   x  y
1  a  1
2  a  2
3  a NA
4  a  2
5  a  3
6  b NA
7  b  1
8  b  2
9  b  3
10 b  1

To

   x  y
1  a  1
2  a  2
6  b NA
7  b  1
8  b  2
like image 675
Hao Avatar asked Dec 06 '22 18:12

Hao


2 Answers

What about this way?

group_by(test_df, x) %>% slice(seq_len(min(which(y == 2))))
Source: local data frame [5 x 2]
Groups: x [2]

       x     y
  (fctr) (dbl)
1      a     1
2      a     2
3      b    NA
4      b     1
5      b     2
like image 149
DatamineR Avatar answered May 24 '23 04:05

DatamineR


group_by(df, x) %>%
    mutate(first2 = min(which(y == 2 | row_number() == n()))) %>%
    filter(row_number() <= first2) %>%
    select(-first2)
# Source: local data frame [5 x 2]
# Groups: x [2]
# 
#        x     y
#   (fctr) (int)
# 1      a     1
# 2      a     2
# 3      b    NA
# 4      b     1
# 5      b     2
# 6      c     1

Using this data

df = structure(list(x = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), y = c(1L, 2L, 
NA, 2L, 3L, NA, 1L, 2L, 3L, 1L, 1L)), .Names = c("x", "y"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
like image 21
Gregor Thomas Avatar answered May 24 '23 04:05

Gregor Thomas