Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete a row in R based on next row condition

Tags:

r

I'm having the following problem:

I have a large dataframe for which I'd like to delete every row that matches this condition: if a value (string) inside a column contains the ":" char and the next row also contains a ":" char, it would delete the first one.

Something like this:

a <- c("value1","value2","value2:a","value2:b","value3")
b <- c(1,2,3,4,5)
df1 <- data.frame(b,a)

  b        a
1 1   value1
2 2   value2
3 3 value2:a
4 4 value2:b
5 5   value3

to this, where only the row containing the name "value2:a" gets deleted because is followed by a row that contains the regex ":"

a        b
value1   1
value2   2
value2:b 4
value3   5

Thank you very much in advance, I've been trying out some solutions with for loops and the grepl function but can't seem to make it work.

like image 382
Felipe Del Valle Avatar asked Oct 07 '20 18:10

Felipe Del Valle


4 Answers

You don't need loops for this. You can do it with a single line in base R by combining a grepl with a lagged grepl

df1[!c(head(grepl(":", df1$a), -1) & tail(grepl(":", df1$a), -1), FALSE),]
#>   b        a
#> 1 1   value1
#> 2 2   value2
#> 4 4 value2:b
#> 5 5   value3
like image 67
Allan Cameron Avatar answered Oct 21 '22 12:10

Allan Cameron


Try this:

df1 %>% 
    dplyr::filter(
        !(stringr::str_detect(a, ":") & stringr::str_detect(lead(a), ":"))
    )
like image 36
user438383 Avatar answered Oct 21 '22 12:10

user438383


Here is a dplyr solution.

library(dplyr)

df1 %>%
  mutate(flag = grepl(":", a),
         flag = cumsum(flag)*flag,
         flag = lead(flag, default = 0)) %>%
  filter(flag != 2) %>%
  dplyr::select(-flag)
#  b        a
#1 1   value1
#2 2   value2
#3 4 value2:b
#4 5   value3
like image 42
Rui Barradas Avatar answered Oct 21 '22 12:10

Rui Barradas


Here you go. For tasks like this I always like to use a couple of masks.

a <- c("value1","value2","value2:a","value2:b","value3")
b <- c(1,2,3,4,5)
df1 <- as.data.frame(b,a)
stringr::str_extract(pattern = ":",string = rownames(df1)) -> vec
mask_colon = duplicated(vec,fromLast = FALSE)
mask_na = is.na(vec)
df1 = df1[which(mask_na | mask_colon),, drop = FALSE]
df1
#>          b
#> value1   1
#> value2   2
#> value2:b 4
#> value3   5

Created on 2020-10-07 by the reprex package (v0.3.0)

like image 20
David Mas Avatar answered Oct 21 '22 11:10

David Mas