Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count consecutive occurrences of a specific value in every row of a data frame in R

Tags:

dataframe

r

I've got a data.frame of monthly values of a variable for many locations (so many rows) and I want to count the numbers of consecutive months (i.e consecutive cells) that have a value of zero. This would be easy if it was just being read left to right, but the added complication is that the end of the year is consecutive to the start of the year.

For example, in the shortened example dataset below (with seasons instead of months),location 1 has 3 '0' months, location 2 has 2, and 3 has none.

df<-cbind(location= c(1,2,3),
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

How can I count these consecutive zero values? I've looked at rle but I'm still none the wiser currently!

Many thanks for any help :)

like image 242
kim1801 Avatar asked Nov 20 '25 22:11

kim1801


1 Answers

You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:

df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

#>      Winter Spring Summer Autumn
#> [1,]      0      0      0      3
#> [2,]      0      2      2      0
#> [3,]      3      4      7      4


# calculate the number of consecutive zeros at the start and end
startZeros  <-  apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros  <-  apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0

# calculate the longest run of zeros
longestRun  <-  apply(df,1,function(x){
                y = rle(x);
                max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0

# take the max of the two values
pmax(longestRun,startZeros +endZeros  )
#> [1] 3 2 0

Of course an even easier solution is:

longestRun  <-  apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
                      1,# the margin over which to apply the summary function
                      function(x){# the summary function
                          y = rle(x);
                          max(y$lengths[y$values==0],
                              0)#include zero incase there are no zeros in y$values
                      })

Note that the above solution works because my df does not include the location field (column).

like image 140
Jthorpe Avatar answered Nov 22 '25 13:11

Jthorpe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!