I have the following dataframe with 0, 1, and NAs for IDs A to E over a one year period:
dat <- data.frame(
id = c("A", "B", "C", "D", "E"),
jan = c(0, 0, NA, 1, 0),
feb = c(0, 1, 1, 0, 0),
mar = c(0, 0, 1, 0, 1),
apr = c(0, NA, 0, NA, 1),
may = c(0, NA, 0, 0, 0),
jun = c(0, 0, 0, 0, 0),
jul = c(0, 0, 0, 0, 1),
aug = c(NA, 0, 0, 1, 1),
sep = c(NA, 0, 0, 1, NA),
okt = c(NA, 0, 0, 0, NA),
nov = c(NA, 0, 0, 0, 1),
dez = c(NA, 0, 0, 0, 0)
)
> dat
id jan feb mar apr may jun jul aug sep okt nov dez
A 0 0 0 0 0 0 0 NA NA NA NA NA
B 0 1 0 NA NA 0 0 0 0 0 0 0
C NA 1 1 0 0 0 0 0 0 0 0 0
D 1 0 0 NA 0 0 0 1 1 0 0 0
E 0 0 1 1 0 0 1 1 NA NA 1 0
I would like to count the number of 1s for each ID over this one year period, but the following conditions need to be met:
In my example, the count would be:
> dat
id jan feb mar apr may jun jul aug sep okt nov dez count
1 A 0 0 0 0 0 0 0 NA NA NA NA NA => 0
2 B 0 1 0 NA NA 0 0 0 0 0 0 0 => 1
3 C NA 1 1 0 0 0 0 0 0 0 0 0 => 1
4 D 1 0 0 NA 0 0 0 1 1 0 0 0 => 2
5 E 0 0 1 1 0 0 1 1 NA NA 1 0 => 1
The function should be applied rowwise in the form of apply(dat[, -1], 1, my_fun)
and return a vector containing the count (i.e. 0, 1, 1, 2, 1
). Does anybody have an idea how to achieve this?
How about using rollapply
from the zoo package:
library(zoo)
library(magrittr)
myfun <- function(y, pattern = c(0,0,0,0,0,0,1)){
y[is.na(y)] <- 0 # to account for both 0s and NAs
first <- sum(y[1:(length(pattern)-1)])!=0
rest <- y %>% as.numeric() %>% rollapply(7, identical, pattern) %>% sum
return(first+rest)
}
apply(dat[,-1],1,myfun)
[1] 0 1 1 2 1
The rollapply part will match any sequence of six 0s followed by a 1 in each row.
The only thing left is to account for 1s in the first 6 months (which you want to count but won't be matched by the rollapply). This is done with the first row of myfun
.
I am going to make use of the fact that your function can return a maximum of 2 per row, since there can never be more than one such sequences of six zeroes. It will reach the maximum if there is a sequence of at least six zeroes somewhere, which does not start at the beginning or end at the end of the row (since then it is surrounded by a 1 on both sides.)
yoursum <- function(x)
{
x[is.na(x)]<-0
booleans = with(rle(x),values==0 & lengths>5)
if(any(booleans))
{
if(which(booleans)<length(booleans) & which(booleans)>1 )
return(2)
}
if(any(x>0))
return(1)
else
return(0)
}
apply(dat[,-1],1,yoursum)
Output:
[1] 0 1 1 2 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With