Count sequences of numbers rowwise

Question

I have the following dataframe with 0, 1, and NAs for IDs A to E over a one year period:

dat <- data.frame(
 id = c("A", "B", "C", "D", "E"),
 jan = c(0, 0, NA, 1, 0),
 feb = c(0, 1, 1, 0, 0),
 mar = c(0, 0, 1, 0, 1),
 apr = c(0, NA, 0, NA, 1),
 may = c(0, NA, 0, 0, 0),
 jun = c(0, 0, 0, 0, 0),
 jul = c(0, 0, 0, 0, 1),
 aug = c(NA, 0, 0, 1, 1),
 sep = c(NA, 0, 0, 1, NA),
 okt = c(NA, 0, 0, 0, NA),
 nov = c(NA, 0, 0, 0, 1),
 dez = c(NA, 0, 0, 0, 0)
)

> dat
  id jan feb mar apr may jun jul aug sep okt nov dez
   A   0   0   0   0   0   0   0  NA  NA  NA  NA  NA
   B   0   1   0  NA  NA   0   0   0   0   0   0   0
   C  NA   1   1   0   0   0   0   0   0   0   0   0
   D   1   0   0  NA   0   0   0   1   1   0   0   0
   E   0   0   1   1   0   0   1   1  NA  NA   1   0

I would like to count the number of 1s for each ID over this one year period, but the following conditions need to be met:

The first occurrence of a 1 is always counted as 1
NAs should be treated as 0s
A second occurrence of a 1 is only counted, if it is preceded by six or more 0s/NAs

In my example, the count would be:

> dat
   id jan feb mar apr may jun jul aug sep okt nov dez     count
 1  A   0   0   0   0   0   0   0  NA  NA  NA  NA  NA      => 0
 2  B   0   1   0  NA  NA   0   0   0   0   0   0   0      => 1
 3  C  NA   1   1   0   0   0   0   0   0   0   0   0      => 1
 4  D   1   0   0  NA   0   0   0   1   1   0   0   0      => 2
 5  E   0   0   1   1   0   0   1   1  NA  NA   1   0      => 1

The function should be applied rowwise in the form of apply(dat[, -1], 1, my_fun) and return a vector containing the count (i.e. 0, 1, 1, 2, 1). Does anybody have an idea how to achieve this?

demirev · Accepted Answer

How about using rollapply from the zoo package:

library(zoo)
library(magrittr)

myfun <- function(y, pattern = c(0,0,0,0,0,0,1)){
    y[is.na(y)] <- 0 # to account for both 0s and NAs
    first <- sum(y[1:(length(pattern)-1)])!=0
    rest  <- y %>% as.numeric() %>% rollapply(7, identical, pattern) %>% sum
    return(first+rest)
}

apply(dat[,-1],1,myfun)

[1] 0 1 1 2 1

The rollapply part will match any sequence of six 0s followed by a 1 in each row.

The only thing left is to account for 1s in the first 6 months (which you want to count but won't be matched by the rollapply). This is done with the first row of myfun.

Florian · Answer

I am going to make use of the fact that your function can return a maximum of 2 per row, since there can never be more than one such sequences of six zeroes. It will reach the maximum if there is a sequence of at least six zeroes somewhere, which does not start at the beginning or end at the end of the row (since then it is surrounded by a 1 on both sides.)

yoursum <- function(x)
{
  x[is.na(x)]<-0
  booleans = with(rle(x),values==0 & lengths>5)
  if(any(booleans))
  {
    if(which(booleans)<length(booleans) & which(booleans)>1 )
      return(2)
  }

  if(any(x>0))
    return(1)
  else
    return(0)
}

apply(dat[,-1],1,yoursum)

Output:

[1] 0 1 1 2 1

Count sequences of numbers rowwise

Tags:

dataframe

r

piptoma

2 Answers

demirev

Florian

Recent Activity

Donate For Us

Count sequences of numbers rowwise

Tags:

dataframe

r

piptoma

2 Answers

demirev

Florian

Related questions

Recent Activity

Donate For Us