Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count sequences of numbers rowwise

Tags:

dataframe

r

I have the following dataframe with 0, 1, and NAs for IDs A to E over a one year period:

dat <- data.frame(
 id = c("A", "B", "C", "D", "E"),
 jan = c(0, 0, NA, 1, 0),
 feb = c(0, 1, 1, 0, 0),
 mar = c(0, 0, 1, 0, 1),
 apr = c(0, NA, 0, NA, 1),
 may = c(0, NA, 0, 0, 0),
 jun = c(0, 0, 0, 0, 0),
 jul = c(0, 0, 0, 0, 1),
 aug = c(NA, 0, 0, 1, 1),
 sep = c(NA, 0, 0, 1, NA),
 okt = c(NA, 0, 0, 0, NA),
 nov = c(NA, 0, 0, 0, 1),
 dez = c(NA, 0, 0, 0, 0)
)

> dat
  id jan feb mar apr may jun jul aug sep okt nov dez
   A   0   0   0   0   0   0   0  NA  NA  NA  NA  NA
   B   0   1   0  NA  NA   0   0   0   0   0   0   0
   C  NA   1   1   0   0   0   0   0   0   0   0   0
   D   1   0   0  NA   0   0   0   1   1   0   0   0
   E   0   0   1   1   0   0   1   1  NA  NA   1   0

I would like to count the number of 1s for each ID over this one year period, but the following conditions need to be met:

  • The first occurrence of a 1 is always counted as 1
  • NAs should be treated as 0s
  • A second occurrence of a 1 is only counted, if it is preceded by six or more 0s/NAs

In my example, the count would be:

> dat
   id jan feb mar apr may jun jul aug sep okt nov dez     count
 1  A   0   0   0   0   0   0   0  NA  NA  NA  NA  NA      => 0
 2  B   0   1   0  NA  NA   0   0   0   0   0   0   0      => 1
 3  C  NA   1   1   0   0   0   0   0   0   0   0   0      => 1
 4  D   1   0   0  NA   0   0   0   1   1   0   0   0      => 2
 5  E   0   0   1   1   0   0   1   1  NA  NA   1   0      => 1

The function should be applied rowwise in the form of apply(dat[, -1], 1, my_fun) and return a vector containing the count (i.e. 0, 1, 1, 2, 1). Does anybody have an idea how to achieve this?

like image 441
piptoma Avatar asked Jan 03 '23 16:01

piptoma


2 Answers

How about using rollapply from the zoo package:

library(zoo)
library(magrittr)

myfun <- function(y, pattern = c(0,0,0,0,0,0,1)){
    y[is.na(y)] <- 0 # to account for both 0s and NAs
    first <- sum(y[1:(length(pattern)-1)])!=0
    rest  <- y %>% as.numeric() %>% rollapply(7, identical, pattern) %>% sum
    return(first+rest)
}

apply(dat[,-1],1,myfun)

[1] 0 1 1 2 1

The rollapply part will match any sequence of six 0s followed by a 1 in each row.

The only thing left is to account for 1s in the first 6 months (which you want to count but won't be matched by the rollapply). This is done with the first row of myfun.

like image 198
demirev Avatar answered Jan 14 '23 17:01

demirev


I am going to make use of the fact that your function can return a maximum of 2 per row, since there can never be more than one such sequences of six zeroes. It will reach the maximum if there is a sequence of at least six zeroes somewhere, which does not start at the beginning or end at the end of the row (since then it is surrounded by a 1 on both sides.)

yoursum <- function(x)
{
  x[is.na(x)]<-0
  booleans = with(rle(x),values==0 & lengths>5)
  if(any(booleans))
  {
    if(which(booleans)<length(booleans) & which(booleans)>1 )
      return(2)
  }

  if(any(x>0))
    return(1)
  else
    return(0)
}

apply(dat[,-1],1,yoursum)

Output:

[1] 0 1 1 2 1
like image 35
Florian Avatar answered Jan 14 '23 18:01

Florian