Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exclude subsequent duplicated rows

I would like to exclude all duplicated rows. However, it has to be true just when they are subsequent rows. Follows a representative example:

My input df:

    df <- "NAME   VALUE 
    Prb1  0.05
    Prb2  0.05
    Prb3  0.05
    Prb4  0.06
    Prb5  0.06
    Prb6  0.01
    Prb7  0.10
    Prb8  0.05"

df <- read.table(text=df, header=T)

My expected outdf:

outdf <- "NAME   VALUE 
Prb1  0.05
Prb4  0.06
Prb6  0.01
Prb7  0.10
Prb8  0.05"

outdf <- read.table(text=df, header=T)
like image 766
user2120870 Avatar asked May 15 '15 13:05

user2120870


People also ask

How do I hide multiple duplicate rows in Excel?

First, select any duplicate cell then hold the CTRL key and select other duplicate rows that you want to hide. Then, right-click on the mouse and select Hide. Hence, all the selected duplicate rows are hidden in the dataset.

How do I exclude duplicate rows in SQL?

The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.


1 Answers

rle() is a nice function that identifies runs of identical values, but it can be kind of a pain to wrestle it's output into a usable form. Here's a relatively painless incantation that works in your case.

df[sequence(rle(df$VALUE)$lengths) == 1, ]
#   NAME VALUE
# 1 Prb1  0.05
# 4 Prb4  0.06
# 6 Prb6  0.01
# 7 Prb7  0.10
# 8 Prb8  0.05
like image 60
Josh O'Brien Avatar answered Oct 27 '22 09:10

Josh O'Brien