Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert rows for missing dates/times

I am new to R but have turned to it to solve a problem with a large data set I am trying to process. Currently I have a 4 columns of data (Y values) set against minute-interval timestamps (month/day/year hour:min) (X values) as below:

    timestamp          tr            tt         sr         st   1   9/1/01 0:00   1.018269e+02   -312.8622   -1959.393   4959.828   2   9/1/01 0:01   1.023567e+02   -313.0002   -1957.755   4958.935   3   9/1/01 0:02   1.018857e+02   -313.9406   -1956.799   4959.938   4   9/1/01 0:03   1.025463e+02   -310.9261   -1957.347   4961.095   5   9/1/01 0:04   1.010228e+02   -311.5469   -1957.786   4959.078 

The problem I have is that some timestamp values are missing - e.g. there may be a gap between 9/1/01 0:13 and 9/1/01 0:27 and such gaps are irregular through the data set. I need to put several of these series into the same database and because the missing values are different for each series, the dates do not currently align on each row.

I would like to generate rows for these missing timestamps and fill the Y columns with blank values (no data, not zero), so that I have a continuous time series.

I'm honestly not quite sure where to start (not really used R before so learning as I go along!) but any help would be much appreciated. I have thus far installed chron and zoo, since it seems they might be useful.

Thanks!

like image 980
James A Avatar asked May 28 '13 08:05

James A


2 Answers

This is an old question, but I just wanted to post a dplyr way of handling this, as I came across this post while searching for an answer to a similar problem. I find it more intuitive and easier on the eyes than the zoo approach.

library(dplyr)  ts <- seq.POSIXt(as.POSIXct("2001-09-01 0:00",'%m/%d/%y %H:%M'), as.POSIXct("2001-09-01 0:07",'%m/%d/%y %H:%M'), by="min")  ts <- seq.POSIXt(as.POSIXlt("2001-09-01 0:00"), as.POSIXlt("2001-09-01 0:07"), by="min") ts <- format.POSIXct(ts,'%m/%d/%y %H:%M')  df <- data.frame(timestamp=ts)  data_with_missing_times <- full_join(df,original_data)     timestamp     tr tt sr st 1 09/01/01 00:00 15 15 78 42 2 09/01/01 00:01 20 64 98 87 3 09/01/01 00:02 31 84 23 35 4 09/01/01 00:03 21 63 54 20 5 09/01/01 00:04 15 23 36 15 6 09/01/01 00:05 NA NA NA NA 7 09/01/01 00:06 NA NA NA NA 8 09/01/01 00:07 NA NA NA NA 

Also using dplyr, this makes it easier to do something like change all those missing values to something else, which came in handy for me when plotting in ggplot.

data_with_missing_times %>% group_by(timestamp) %>% mutate_each(funs(ifelse(is.na(.),0,.)))     timestamp     tr tt sr st 1 09/01/01 00:00 15 15 78 42 2 09/01/01 00:01 20 64 98 87 3 09/01/01 00:02 31 84 23 35 4 09/01/01 00:03 21 63 54 20 5 09/01/01 00:04 15 23 36 15 6 09/01/01 00:05  0  0  0  0 7 09/01/01 00:06  0  0  0  0 8 09/01/01 00:07  0  0  0  0 
like image 200
lbollar Avatar answered Sep 17 '22 16:09

lbollar


I think the easiest thing ist to set Date first as already described, convert to zoo, and then just set a merge:

df$timestamp<-as.POSIXct(df$timestamp,format="%m/%d/%y %H:%M")  df1.zoo<-zoo(df[,-1],df[,1]) #set date to Index  df2 <- merge(df1.zoo,zoo(,seq(start(df1.zoo),end(df1.zoo),by="min")), all=TRUE) 

Start and end are given from your df1 (original data) and you are setting by - e.g min - as you need for your example. all=TRUE sets all missing values at the missing dates to NAs.

like image 42
Herr Student Avatar answered Sep 16 '22 16:09

Herr Student