Fill missing sequence values with dplyr

Tags:

I have a data frame with missing values for "SNAP_ID". I'd like to fill in the missing values with floating point values based on a sequence from the previous non-missing value (lag()?). I would really like to achieve this using just dplyr if possible.

Assumptions:

There will never be missing data as the first or last row I'm generating the missing dates based on missing days between a min and max of a data set
There can be multiple gaps in the data set

Current data:

                  end SNAP_ID
1 2015-06-26 12:59:00     365
2 2015-06-26 13:59:00     366
3 2015-06-27 00:01:00      NA
4 2015-06-27 23:00:00      NA
5 2015-06-28 00:01:00      NA
6 2015-06-28 23:00:00      NA
7 2015-06-29 09:00:00     367
8 2015-06-29 09:59:00     368

What I want to achieve:

                  end SNAP_ID
1 2015-06-26 12:59:00     365.0
2 2015-06-26 13:59:00     366.0
3 2015-06-27 00:01:00     366.1
4 2015-06-27 23:00:00     366.2
5 2015-06-28 00:01:00     366.3
6 2015-06-28 23:00:00     366.4
7 2015-06-29 09:00:00     367.0
8 2015-06-29 09:59:00     368.0

As a data frame:

df <- structure(list(end = structure(c(1435323540, 1435327140, 1435363260, 
    1435446000, 1435449660, 1435532400, 1435568400, 1435571940), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), SNAP_ID = c(365, 366, NA, NA, NA, NA, 367, 368)), .Names = c("end", 
    "SNAP_ID"), row.names = c(NA, -8L), class = "data.frame")

This was my attempt at achieving this goal, but it only works for the first missing value:

df %>% 
  arrange(end) %>%
  mutate(SNAP_ID=ifelse(is.na(SNAP_ID),lag(SNAP_ID)+0.1,SNAP_ID))

                  end SNAP_ID
1 2015-06-26 12:59:00   365.0
2 2015-06-26 13:59:00   366.0
3 2015-06-27 00:01:00   366.1
4 2015-06-27 23:00:00      NA
5 2015-06-28 00:01:00      NA
6 2015-06-28 23:00:00      NA
7 2015-06-29 09:00:00   367.0
8 2015-06-29 09:59:00   368.0

The outstanding answer from @mathematical.coffee below:

df %>% 
  arrange(end) %>%
  group_by(tmp=cumsum(!is.na(SNAP_ID))) %>%
  mutate(SNAP_ID=SNAP_ID[1] + 0.1*(0:(length(SNAP_ID)-1))) %>%
  ungroup() %>%
  select(-tmp)

447

asked Jul 16 '15 22:07

Tyler Muth

1 Answers

EDIT: new version works for any number of NA runs. This one doesn't need zoo, either.

First, notice that tmp=cumsum(!is.na(SNAP_ID)) groups the SNAP_IDs such groups of the same tmp consist of one non-NA value followed by a run of NA values.

Then group by this variable and just add .1 to the first SNAP_ID to fill out the NAs:

df %>% 
  arrange(end) %>%
  group_by(tmp=cumsum(!is.na(SNAP_ID))) %>%
  mutate(SNAP_ID=SNAP_ID[1] + 0.1*(0:(length(SNAP_ID)-1)))

                  end SNAP_ID tmp
1 2015-06-26 12:59:00   365.0   1
2 2015-06-26 13:59:00   366.0   2
3 2015-06-27 00:01:00   366.1   2
4 2015-06-27 23:00:00   366.2   2
5 2015-06-28 00:01:00   366.3   2
6 2015-06-28 23:00:00   366.4   2
7 2015-06-29 09:00:00   367.0   3
8 2015-06-29 09:59:00   368.0   4

Then you can drop the tmp column afterwards (add %>% select(-tmp) to the end).

EDIT: this is the old version which doesn't work for subsequent runs of NAs.

If your aim is to fill each NA with the previous value + 0.1, you can use zoo's na.locf (which fills each NA with the previous value), along with cumsum(is.na(SNAP_ID))*0.1 to add the extra 0.1.

library(zoo)
df %>% 
  arrange(end) %>%
  mutate(SNAP_ID=ifelse(is.na(SNAP_ID),
                       na.locf(SNAP_ID) + cumsum(is.na(SNAP_ID))*0.1,
                       SNAP_ID))

185

answered Oct 11 '22 18:10

mathematical.coffee

Related questions
                            
                                Flag all rows in a group using data table in R if some rows meet a condition
                            
                                Adding column of predicted Hazard Ratio to dataframe after Cox Regression in R
                            
                                Install some parts from Github when calling "install.packages()" in R
                            
                                Scatterplot of Year-On-Year Correlation of Data in R using ggplot2
                            
                                Creating compound/interacted dummy variables in data.table in R
                            
                                What is python's not? A special function type?
                            
                                Draw geom_smooth only for fits that are significant
                            
                                How to work with huge matrices in R? [closed]
                            
                                Export R package documentation to a web page
                            
                                Load dataset from "R" package using data(), assign it directly to a variable?
                            
                                copy a list of data.tables
                            
                                Zero inflated poisson model fails to fit
                            
                                plot tree in ggplot in R
                            
                                R - servr::jekyll() build error
                            
                                R cut dendrogram into groups with minimum size
                            
                                R: List of indices to binary matrix
                            
                                Install R package from Atlassian Stash
                            
                                Using substitute to do variable substitutions inside R expressions
                            
                                Create new column in data frame using a for loop to calculate value in R?
                            
                                unable to install 'XML' package dependency for 'pmml' on Ubuntu

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fill missing sequence values with dplyr

Tags:

r

dplyr

Tyler Muth

People also ask

1 Answers

mathematical.coffee

Recent Activity

Donate For Us