Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create "NA" for missing data in a time series

I have several files of data that look like this:

X code year month day pp  
1 4515 1953     6   1  0  
2 4515 1953     6   2  0  
3 4515 1953     6   3  0  
4 4515 1953     6   4  0  
5 4515 1953     6   5  3.5

Sometimes there is data missing, but I don't have NAs, the rows simply don't exist. I need to create NAs when the data is missing. I though I could start by identifying when that occurs by converting it to a zoo object and check for strict regularity (I never used zoo before), I used the following code:

z.date<-paste(CET$year, CET$month, CET$day, sep="/")
z <- read.zoo(CET,  order.by= z.date )
reg<-is.regular(z, strict = TRUE)

But the answer is always true!

Can anyone tell me why is not working? Or even better, tell me a way to create NAs when the data is missing (with or without zoo package)?

thanks

like image 920
sbg Avatar asked May 19 '11 12:05

sbg


People also ask

How do you fill a missing value in a time series in Excel?

To fill in the missing values, we can highlight the range starting before and after the missing values, then click Home > Editing > Fill > Series. What is this? If we select the Type as Growth and click the box next to Trend, Excel automatically identifies the growth trend in the data and fills in the missing values.

How do you find the NA value of a data set?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.


2 Answers

The seq function has some interesting features that you can use to easily generate a complete sequence of dates. For example, the following code can be used to generate a sequence of dates starting on April 25:

Edit: This feature is documented in ?seq.Date

start = as.Date("2011/04/25")
full <- seq(start, by='1 day', length=15)
full

 [1] "2011-04-25" "2011-04-26" "2011-04-27" "2011-04-28" "2011-04-29"
 [6] "2011-04-30" "2011-05-01" "2011-05-02" "2011-05-03" "2011-05-04"
[11] "2011-05-05" "2011-05-06" "2011-05-07" "2011-05-08" "2011-05-09"

Now use the same principle to generate some data with "missing" rows, by generating the sequence for every 2nd day:

partial <- data.frame(
    date=seq(start, by='2 day', length=6),
    value=1:6
)
partial

        date value
1 2011-04-25     1
2 2011-04-27     2
3 2011-04-29     3
4 2011-05-01     4
5 2011-05-03     5
6 2011-05-05     6

To answer your question, one can use vector subscripting or the match function to create a dataset with NAs:

with(partial, value[match(full, date)])
 [1]  1 NA  2 NA  3 NA  4 NA  5 NA  6 NA NA NA NA

To combine this result with the original full data:

data.frame(Date=full, value=with(partial, value[match(full, date)]))
         Date value
1  2011-04-25     1
2  2011-04-26    NA
3  2011-04-27     2
4  2011-04-28    NA
5  2011-04-29     3
6  2011-04-30    NA
7  2011-05-01     4
8  2011-05-02    NA
9  2011-05-03     5
10 2011-05-04    NA
11 2011-05-05     6
12 2011-05-06    NA
13 2011-05-07    NA
14 2011-05-08    NA
15 2011-05-09    NA
like image 102
Andrie Avatar answered Oct 25 '22 13:10

Andrie


In the zoo package "regular" means that the series is equally spaced except possibly for some missing entries. The zooreg class in the zoo package is specifically for that type of series. Note that the set of all regular series includes the set of all equally spaced series but is strictly larger.

The is.regular function checks whether a given series is regular. That is, is the series amenable to making it equally spaced if one inserted NAs for the missing entries?

Regarding your last question, its a FAQ. See FAQ #13 in the zoo FAQ available from the zoo CRAN page or from within R via:

vignette("zoo-faq") 

Also in FAQ #13 there is some illustrative code.

like image 40
G. Grothendieck Avatar answered Oct 25 '22 12:10

G. Grothendieck