I would like to exclude all duplicated rows. However, it has to be true just when they are subsequent rows. Follows a representative example:
My input df
:
df <- "NAME VALUE
Prb1 0.05
Prb2 0.05
Prb3 0.05
Prb4 0.06
Prb5 0.06
Prb6 0.01
Prb7 0.10
Prb8 0.05"
df <- read.table(text=df, header=T)
My expected outdf
:
outdf <- "NAME VALUE
Prb1 0.05
Prb4 0.06
Prb6 0.01
Prb7 0.10
Prb8 0.05"
outdf <- read.table(text=df, header=T)
First, select any duplicate cell then hold the CTRL key and select other duplicate rows that you want to hide. Then, right-click on the mouse and select Hide. Hence, all the selected duplicate rows are hidden in the dataset.
The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.
rle()
is a nice function that identifies runs of identical values, but it can be kind of a pain to wrestle it's output into a usable form. Here's a relatively painless incantation that works in your case.
df[sequence(rle(df$VALUE)$lengths) == 1, ]
# NAME VALUE
# 1 Prb1 0.05
# 4 Prb4 0.06
# 6 Prb6 0.01
# 7 Prb7 0.10
# 8 Prb8 0.05
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With