For each group i want to delete the row if value matches the previous row
x <- c(1,1,1,1,2,2,2,2)
y <- c("A","B","B","A","A","A","B","B")
xy <- data.frame(x,y)
colnames(xy)<-c("group","value")
xy
It should result in
x <- c(1,1,1,2,2)
y <- c("A","B","A","A","B")
result_df <- data.frame(x,y)
colnames(result_df)<-c("group","value")
result_df
Think I have to apply something with lag, but i dont get it.
You are correct that lag
is an appropriate way to do this comparison. First you group_by
your group value so it filters within each category, then filter out those where the value is equal to lag(value)
aka the previous value. The is.na
statement compensates for the first lag value being NA in each group.
library(dplyr)
xy %>% group_by(group) %>% filter(value!=lag(value) | is.na(lag(value)))
# A tibble: 5 x 2
# Groups: group [2]
# group value
# <dbl> <fct>
# 1 1.00 A
# 2 1.00 B
# 3 1.00 A
# 4 2.00 A
# 5 2.00 B
n <- nrow(xy)
xy[!c(FALSE, rowMeans(xy[-1, ] == xy[-n, ]) == 1), ]
group value
1 1 A
2 1 B
4 1 A
5 2 A
7 2 B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With