I'm having the following problem:
I have a large dataframe for which I'd like to delete every row that matches this condition: if a value (string) inside a column contains the ":" char and the next row also contains a ":" char, it would delete the first one.
Something like this:
a <- c("value1","value2","value2:a","value2:b","value3")
b <- c(1,2,3,4,5)
df1 <- data.frame(b,a)
b a
1 1 value1
2 2 value2
3 3 value2:a
4 4 value2:b
5 5 value3
to this, where only the row containing the name "value2:a" gets deleted because is followed by a row that contains the regex ":"
a b
value1 1
value2 2
value2:b 4
value3 5
Thank you very much in advance, I've been trying out some solutions with for loops and the grepl function but can't seem to make it work.
You don't need loops for this. You can do it with a single line in base R by combining a grepl
with a lagged grepl
df1[!c(head(grepl(":", df1$a), -1) & tail(grepl(":", df1$a), -1), FALSE),]
#> b a
#> 1 1 value1
#> 2 2 value2
#> 4 4 value2:b
#> 5 5 value3
Try this:
df1 %>%
dplyr::filter(
!(stringr::str_detect(a, ":") & stringr::str_detect(lead(a), ":"))
)
Here is a dplyr
solution.
library(dplyr)
df1 %>%
mutate(flag = grepl(":", a),
flag = cumsum(flag)*flag,
flag = lead(flag, default = 0)) %>%
filter(flag != 2) %>%
dplyr::select(-flag)
# b a
#1 1 value1
#2 2 value2
#3 4 value2:b
#4 5 value3
Here you go. For tasks like this I always like to use a couple of masks.
a <- c("value1","value2","value2:a","value2:b","value3")
b <- c(1,2,3,4,5)
df1 <- as.data.frame(b,a)
stringr::str_extract(pattern = ":",string = rownames(df1)) -> vec
mask_colon = duplicated(vec,fromLast = FALSE)
mask_na = is.na(vec)
df1 = df1[which(mask_na | mask_colon),, drop = FALSE]
df1
#> b
#> value1 1
#> value2 2
#> value2:b 4
#> value3 5
Created on 2020-10-07 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With