Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting unequal elements in-between equal elements in R df column

Tags:

dataframe

r

I'm quite new to R and while I have done some data wrangling with it, I am completely at a loss on how to tackle this problem. Google and SO search didn't get me anywhere so far. Should this be a duplicate, I'm sorry, then please point me to the right solution.

I have a df with 2 columns called id and seq. like so

set.seed(12)
id <- rep(c(1:2),10)
seq<-sample(c(1:4),20,replace=T)
df <- data.frame(id,seq)
df <- df[order(df$id),]

    id seq  
 1   1   1
 3   1   4
 5   1   1
 7   1   1
 9   1   1
 11  1   2
 13  1   2
 15  1   2
 17  1   2
 19  1   3
 2   2   4
 4   2   2
 6   2   1
 8   2   3
 10  2   1
 12  2   4
 14  2   2
 16  2   2
 18  2   3
 20  2   1

I would need to count the number of unequal elements in between the equal elements in the seq column e.g. how many elements are between 1 and 1 or 3 and 3 etc. The first instance of the element should be NaN because there is no element before this to count.If the next element is identical it should just code 0, as there is no unequal element in-between e.g. 1 and 1. The results should be written out in a new column e.g. delay.

One catch is that this process would have to start again once a new id starts in the id column (here: 1 & 2).

This is what I would love to have as output:

     id seq   delay 
 1   1   1     NA
 3   1   4     NA
 5   1   1     1
 7   1   1     0
 9   1   1     0
 11  1   2     NA
 13  1   2     0
 15  1   2     0
 17  1   2     0
 19  1   3     NA
 2   2   4     NA
 4   2   2     NA
 6   2   1     NA
 8   2   3     NA
 10  2   1     1
 12  2   4     4
 14  2   2     4
 16  2   2     0
 18  2   3     4
 20  2   1     4

I really hope someone might be able to help me figure this out and allow me learn more about this.

like image 504
Vanessa S. Avatar asked Aug 09 '18 12:08

Vanessa S.


People also ask

How do I count a column in R?

The ncol() function in R programming R programming helps us with ncol() function by which we can get the information on the count of the columns of the object. That is, ncol() function returns the total number of columns present in the object.

How do you check if all columns are equal in R?

To check for equality of three columns by row, we can use logical comparison of equality with double equal sign (==) and & operator.


1 Answers

A simple dplyr solution:

df %>%
  mutate(row = 1:n()) %>%
  group_by(id, seq) %>%
  mutate(delay = row - lag(row) - 1) %>%
  select(-row)
# # A tibble: 20 x 3
# # Groups:   id, seq [8]
#       id   seq delay
#    <int> <int> <dbl>
#  1     1     1    NA
#  2     1     4    NA
#  3     1     1     1
#  4     1     1     0
#  5     1     1     0
#  6     1     2    NA
#  7     1     2     0
#  8     1     2     0
#  9     1     2     0
# 10     1     3    NA
# 11     2     4    NA
# 12     2     2    NA
# 13     2     1    NA
# 14     2     3    NA
# 15     2     1     1
# 16     2     4     4
# 17     2     2     4
# 18     2     2     0
# 19     2     3     4
# 20     2     1     4
like image 64
Scarabee Avatar answered Oct 17 '22 14:10

Scarabee