Count a sequence to include NA values

Tags:

sequence

Here is a sample data frame that resembles a larger data set:

Day <- c(1, 2, NA, 3, 4, NA, NA, NA, NA, NA, 1, 2, 3, NA, NA, NA, NA, 1, 2, NA, NA, 3, 4, 5)
y   <- rpois(length(Day), 2)
z   <- seq(1:length(Day)) + 500
df  <- data.frame(z, Day, y)

If there is a sequence of 4 or more missing values (NAs) in the Day column, that sequence represents a gap between cohorts in the study. If there are less than 4 NAs in a sequence, then the missing value is still considered part of the cohort (e.g. row 3 is part of cohort 1, but row 8 is not). In the sample data frame, there are 3 cohorts (Cohort 1: rows 1-5, Cohort 2: rows 11-13, and Cohort 3: rows 18-24). I would like to add a column listing the cohort number and another column listing the cohort study day. Here is the code I used:

require(dplyr)
CheckNA        <- rle(is.na(df$Day))
CheckNA$values <- CheckNA$lengths >= 4 & CheckNA$values == 1
ListNA         <- rep(CheckNA$values, CheckNA$lengths)
df$Co          <- rep(c(1, NA, 2, NA, 3), rle(ListNA)$lengths) %>% as.factor()

df <- df %>% 
  group_by (Co) %>% 
  mutate(CoDay = seq(Co)) %>% 
  as.data.frame()

df$CoDay <- ifelse(is.na(df$Co), NA, df$CoDay)

Is there a more efficient way to accomplish this task? I'm specially looking for code to avoid having to list the cohort number, since my actual data set will have over 10 cohorts. I currently just list the sequence that should be repeated: c(1, NA, 2, NA, 3).

218

asked Apr 06 '17 20:04

Tania Alarcon

1 Answers

I'd make a change here

CheckNA        <- rle(is.na(df$Day))
CheckNA$values <- CheckNA$lengths >= 4 & CheckNA$values == 1
CheckNA$values <- ifelse(!CheckNA$values, cumsum(CheckNA$values)+1, NA)
df$Co <- inverse.rle(CheckNA)

I kept the first two lines the same, then I used cumsum() to assign new IDs at each break. This means you won't have to hard-code any values. With the new values, you can use inverse.rle much in the same way you used rep() to expand the new ID out to each of the rows.

If you turn that into a function, you can clean up the dplyr bits

id_NA_break <- function(x) {
  CheckNA        <- rle(is.na(x))
  CheckNA$values <- CheckNA$lengths >= 4 & CheckNA$values == 1
  CheckNA$values <- ifelse(!CheckNA$values, cumsum(CheckNA$values)+1, NA)
  inverse.rle(CheckNA)  
}

df  <- data.frame(z, Day, y)
df %>% 
  mutate(Co=id_NA_break(Day)) %>%
  group_by(Co) %>% 
  mutate(CoDay = ifelse(is.na(Co), NA, seq(Co)))

answered Sep 29 '22 09:09

MrFlick

Related questions
                            
                                R's t-distribution says "full precision may not have been achieved"
                            
                                accessing nested lists in R
                            
                                Parameters and NULL
                            
                                add exact proportion of random missing values to data.frame
                            
                                How can I add fractional times in R?
                            
                                tooltip or popover in Shiny datatables for row names?
                            
                                Why can't we use . as a parameter in an anonymous function with %>%
                            
                                Where is `ecdf` saving its object? (and how to measure it?)
                            
                                Change the shape of legend key for geom_bar in ggplot2
                            
                                ggplot2 geom_area overlay area plots in front of each other
                            
                                Test interaction with users in R package
                            
                                sqldf : create table from data frame error: "no such table". and two tables created instead of one
                            
                                Saving html to pdf in chrome
                            
                                R Shiny - sankey plot with click events
                            
                                Producing an inset map with the tmap package in R
                            
                                Diagrammer cannot create nodes in R
                            
                                In R, how do I save a data.tree plot to a file?
                            
                                Customize background color of ggtitle
                            
                                add a footer in DT
                            
                                Using dplyr to group_by and conditionally mutate a dataframe by group

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With