I have a data set with repeating rows. I want to remove consecutive repeated and count them but only if they're consecutive. I'm looking for an efficient way to do this. Can't think of how in dplyr or data.table.
dat <- data.frame( x = c(6, 2, 3, 3, 3, 1, 1, 6, 5, 5, 6, 6, 5, 4), y = c(7, 5, 7, 7, 7, 5, 5, 7, 1, 2, 7, 7, 1, 7), z = c(rep(LETTERS[1:2], each=7)) ) ## x y z ## 1 6 7 A ## 2 2 5 A ## 3 3 7 A ## 4 3 7 A ## 5 3 7 A ## 6 1 5 A ## 7 1 5 A ## 8 6 7 B ## 9 5 1 B ## 10 5 2 B ## 11 6 7 B ## 12 6 7 B ## 13 5 1 B ## 14 4 7 B
x y z n 1 6 7 A 1 2 2 5 A 1 3 3 7 A 3 4 1 5 A 2 5 6 7 B 1 6 5 1 B 1 7 5 2 B 1 8 6 7 B 2 9 5 1 B 1 10 4 7 B 1
Select the range of cells that has duplicate values you want to remove. Tip: Remove any outlines or subtotals from your data before trying to remove duplicates. Click Data > Remove Duplicates, and then Under Columns, check or uncheck the columns where you want to remove the duplicates.
In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.
With data.table:
library(data.table) setDT(dat) dat[, c(.SD[1L], .N), by=.(g = rleidv(dat))][, g := NULL] x y z N 1: 6 7 A 1 2: 2 5 A 1 3: 3 7 A 3 4: 1 5 A 2 5: 6 7 B 1 6: 5 1 B 1 7: 5 2 B 1 8: 6 7 B 2 9: 5 1 B 1 10: 4 7 B 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With