I have the following dataframe: <pre class="prettyprint"><code>a a a b c c d e a a b b b e e d d </code></pre> The required result should be <pre class="prettyprint"><code>a b c d e a b e d </code></pre> It means no two consecutive rows should have same value. How it can be done without using loop. As my data set is quite huge, looping is taking lot of time to execute. The dataframe structure is like the following <pre class="prettyprint"><code>a 1 a 2 a 3 b 2 c 4 c 1 d 3 e 9 a 4 a 8 b 10 b 199 e 2 e 5 d 4 d 10 </code></pre> Result: <pre class="prettyprint"><code>a 1 b 2 c 4 d 3 e 9 a 4 b 10 e 2 d 4 </code></pre> Its should delete the entire row.

One easy way is to use <code>rle</code>: Here's your sample data: <pre class="prettyprint"><code>x <- scan(what = character(), text = "a a a b c c d e a a b b b e e d d") # Read 17 items </code></pre> <code>rle</code> returns a <code>list</code> with two values: the run length ("<code>lengths</code>"), and the value that is repeated for that run ("<code>values</code>"). <pre class="prettyprint"><code>rle(x)$values # [1] "a" "b" "c" "d" "e" "a" "b" "e" "d" </code></pre> <hr> <h3>Update: For a <code>data.frame</code> </h3> If you are working with a <code>data.frame</code>, try something like the following: <pre class="prettyprint"><code>## Sample data mydf <- data.frame( V1 = c("a", "a", "a", "b", "c", "c", "d", "e", "a", "a", "b", "b", "e", "e", "d", "d"), V2 = c(1, 2, 3, 2, 4, 1, 3, 9, 4, 8, 10, 199, 2, 5, 4, 10) ) ## Use rle, as before X <- rle(mydf$V1) ## Identify the rows you want to keep Y <- cumsum(c(1, X$lengths[-length(X$lengths)])) Y # [1] 1 4 5 7 8 9 11 13 15 mydf[Y, ] # V1 V2 # 1 a 1 # 4 b 2 # 5 c 4 # 7 d 3 # 8 e 9 # 9 a 4 # 11 b 10 # 13 e 2 # 15 d 4 </code></pre> <hr> <h3>Update 2</h3> The "data.table" package has a function <code>rleid</code> that lets you do this quite easily. Using <code>mydf</code> from above, try: <pre class="prettyprint"><code>library(data.table) as.data.table(mydf)[, .SD[1], by = rleid(V1)] # rleid V2 # 1: 1 1 # 2: 2 2 # 3: 3 4 # 4: 4 3 # 5: 5 9 # 6: 6 4 # 7: 7 10 # 8: 8 2 # 9: 9 4 </code></pre>

<pre class="prettyprint"><code>library(dplyr) x <- c("a", "a", "a", "b", "c", "c", "d", "e", "a", "a", "b", "b", "b", "e", "e", "d", "d") x[x!=lag(x, default=1)] #[1] "a" "b" "c" "d" "e" "a" "b" "e" "d" </code></pre> EDIT: For <code>data.frame</code> <pre class="prettyprint"><code> mydf <- data.frame( V1 = c("a", "a", "a", "b", "c", "c", "d", "e", "a", "a", "b", "b", "e", "e", "d", "d"), V2 = c(1, 2, 3, 2, 4, 1, 3, 9, 4, 8, 10, 199, 2, 5, 4, 10), stringsAsFactors=FALSE) </code></pre> dplyr solution is one liner: <pre class="prettyprint"><code>mydf %>% filter(V1!= lag(V1, default="1")) # V1 V2 #1 a 1 #2 b 2 #3 c 4 #4 d 3 #5 e 9 #6 a 4 #7 b 10 #8 e 2 #9 d 4 </code></pre> post scriptum <code>lead(x,1)</code> suggested by @Carl Witthoft iterates in reverse order. <pre class="prettyprint"><code>leadit<-function(x) x!=lead(x, default="what") rows <- leadit(mydf[ ,1]) mydf[rows, ] # V1 V2 #3 a 3 #4 b 2 #6 c 1 #7 d 3 #8 e 9 #10 a 8 #12 b 199 #14 e 5 #16 d 10 </code></pre>

Remove/collapse consecutive duplicate values in sequence

I have the following dataframe:

a a a b c c d e a a b b b e e d d

The required result should be

a b c d e a b e d

It means no two consecutive rows should have same value. How it can be done without using loop.

As my data set is quite huge, looping is taking lot of time to execute.

The dataframe structure is like the following

a 1  a 2 a 3 b 2 c 4 c 1 d 3 e 9 a 4 a 8 b 10 b 199 e 2 e 5 d 4 d 10

Result:

a 1  b 2 c 4 d 3 e 9 a 4 b 10 e 2 d 4

Its should delete the entire row.

How do I get rid of consecutive duplicates in pandas?

To drop consecutive duplicates with Python Pandas, we can use shift . to check if the last column isn't equal the current one with a. shift(-1) !=

Which of the following command will help to remove consecutive duplicates?

If you use 'uniq' command without any arguments, it will remove all consecutive duplicate lines and display only the unique lines.

One easy way is to use rle:

Here's your sample data:

x <- scan(what = character(), text = "a a a b c c d e a a b b b e e d d") # Read 17 items

rle returns a list with two values: the run length ("lengths"), and the value that is repeated for that run ("values").

rle(x)$values # [1] "a" "b" "c" "d" "e" "a" "b" "e" "d"

Update: For a `data.frame`

If you are working with a data.frame, try something like the following:

## Sample data mydf <- data.frame(   V1 = c("a", "a", "a", "b", "c", "c", "d", "e",           "a", "a", "b", "b", "e", "e", "d", "d"),   V2 = c(1, 2, 3, 2, 4, 1, 3, 9,           4, 8, 10, 199, 2, 5, 4, 10) )  ## Use rle, as before X <- rle(mydf$V1) ## Identify the rows you want to keep Y <- cumsum(c(1, X$lengths[-length(X$lengths)])) Y # [1]  1  4  5  7  8  9 11 13 15 mydf[Y, ] #    V1 V2 # 1   a  1 # 4   b  2 # 5   c  4 # 7   d  3 # 8   e  9 # 9   a  4 # 11  b 10 # 13  e  2 # 15  d  4

Update 2

The "data.table" package has a function rleid that lets you do this quite easily. Using mydf from above, try:

library(data.table) as.data.table(mydf)[, .SD[1], by = rleid(V1)] #    rleid V2 # 1:     1  1 # 2:     2  2 # 3:     3  4 # 4:     4  3 # 5:     5  9 # 6:     6  4 # 7:     7 10 # 8:     8  2 # 9:     9  4

library(dplyr) x <- c("a", "a", "a", "b", "c", "c", "d", "e", "a", "a", "b", "b", "b", "e", "e", "d", "d") x[x!=lag(x, default=1)] #[1] "a" "b" "c" "d" "e" "a" "b" "e" "d"

EDIT: For data.frame

  mydf <- data.frame(     V1 = c("a", "a", "a", "b", "c", "c", "d", "e",           "a", "a", "b", "b", "e", "e", "d", "d"),     V2 = c(1, 2, 3, 2, 4, 1, 3, 9,           4, 8, 10, 199, 2, 5, 4, 10),    stringsAsFactors=FALSE)

dplyr solution is one liner:

mydf %>% filter(V1!= lag(V1, default="1")) #  V1 V2 #1  a  1 #2  b  2 #3  c  4 #4  d  3 #5  e  9 #6  a  4 #7  b 10 #8  e  2 #9  d  4

post scriptum

lead(x,1) suggested by @Carl Witthoft iterates in reverse order.

leadit<-function(x) x!=lead(x, default="what") rows <- leadit(mydf[ ,1]) mydf[rows, ]  #   V1  V2 #3   a   3 #4   b   2 #6   c   1 #7   d   3 #8   e   9 #10  a   8 #12  b 199 #14  e   5 #16  d  10

Remove/collapse consecutive duplicate values in sequence

Tags:

Amarjeet

People also ask

2 Answers

Update: For a `data.frame`

Update 2

A5C1D2H2I1M1N2O1R2T1

Khashaa

Recent Activity

Donate For Us

Remove/collapse consecutive duplicate values in sequence

Tags:

Amarjeet

People also ask

2 Answers

Update: For a data.frame

Update 2

A5C1D2H2I1M1N2O1R2T1

Khashaa

Related questions

Recent Activity

Donate For Us

Update: For a `data.frame`