Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting the number of times the next element in a vector is different to the previous one

Tags:

r

I have a matrix that looks like this:

a=c(rep(0,5),rep(1,5),rep(2,5))
b=c(rep(1,5),rep(1,5),rep(2,5))
d=rbind(a,b)

  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
a    0    0    0    0    0    1    1    1    1     1     2     2     2     2     2
b    1    1    1    1    1    1    1    1    1     1     2     2     2     2     2

What I want to do is count the number of times there is a change of a value across a row. For instance, on the first row, there are 2 changes one at column 5 to 6 and column 10 to 11.

I used an if statement and a for loop to compare each value and a counter c to count the number of times a changed occur:

m=matrix(NA, nrow = length(d[,1]), ncol = 1)

for (s in 1:length(d[,1])){

  c=0

  for (i in 1:length(d[1,])){

    if (i < length(d[1,])){

      if (d[s,i]!=d[s,(i+1)]){
        c=c+1
      }  

    }

  }

  m[s,1]<-c
}

At the end I have a matrix m with the number of switches on every row. However, my data has thousands of rows and thousands of columns and this script is taking way too long to count the changes.

like image 741
GabrielMontenegro Avatar asked Feb 24 '16 15:02

GabrielMontenegro


1 Answers

You could also try this:

apply(d,1,function(x) length(rle(x)$values)-1)

This function iterates through every row of the dataframe d. The iteration is done by apply and the second parameter (the margin) has the value 1 which indicates that rows should be selected (a margin of two would indicate columns).

So we apply the anonymous function length(rle(x)$values) to every row, which is temporarily stored in x. According to help(rle), the rle() function does the following:

Compute the lengths and values of runs of equal values in a vector

We are only interested in the values, and not in the length of the consecutive runs. But as a matter of fact, we don't even need to know the values, which are stored in rle(x)$values. The only thing that we care about here is how many values we have in the vector that constitute "runs of equal values". To extract the number of values, we can use the length() function, which determines the number of entries in a vector. Finally, since there is always at least one value and we want to know how often the value changes , we need to subtract 1 from the result obtained by length().

Hope this helps.

like image 157
RHertel Avatar answered Oct 22 '22 16:10

RHertel