How to create "NA" for missing data in a time series

Tags:

I have several files of data that look like this:

X code year month day pp  
1 4515 1953     6   1  0  
2 4515 1953     6   2  0  
3 4515 1953     6   3  0  
4 4515 1953     6   4  0  
5 4515 1953     6   5  3.5

Sometimes there is data missing, but I don't have NAs, the rows simply don't exist. I need to create NAs when the data is missing. I though I could start by identifying when that occurs by converting it to a zoo object and check for strict regularity (I never used zoo before), I used the following code:

z.date<-paste(CET$year, CET$month, CET$day, sep="/")
z <- read.zoo(CET,  order.by= z.date )
reg<-is.regular(z, strict = TRUE)

But the answer is always true!

Can anyone tell me why is not working? Or even better, tell me a way to create NAs when the data is missing (with or without zoo package)?

thanks

920

asked May 19 '11 12:05

sbg

2 Answers

The seq function has some interesting features that you can use to easily generate a complete sequence of dates. For example, the following code can be used to generate a sequence of dates starting on April 25:

Edit: This feature is documented in ?seq.Date

start = as.Date("2011/04/25")
full <- seq(start, by='1 day', length=15)
full

 [1] "2011-04-25" "2011-04-26" "2011-04-27" "2011-04-28" "2011-04-29"
 [6] "2011-04-30" "2011-05-01" "2011-05-02" "2011-05-03" "2011-05-04"
[11] "2011-05-05" "2011-05-06" "2011-05-07" "2011-05-08" "2011-05-09"

Now use the same principle to generate some data with "missing" rows, by generating the sequence for every 2nd day:

partial <- data.frame(
    date=seq(start, by='2 day', length=6),
    value=1:6
)
partial

        date value
1 2011-04-25     1
2 2011-04-27     2
3 2011-04-29     3
4 2011-05-01     4
5 2011-05-03     5
6 2011-05-05     6

To answer your question, one can use vector subscripting or the match function to create a dataset with NAs:

with(partial, value[match(full, date)])
 [1]  1 NA  2 NA  3 NA  4 NA  5 NA  6 NA NA NA NA

To combine this result with the original full data:

data.frame(Date=full, value=with(partial, value[match(full, date)]))
         Date value
1  2011-04-25     1
2  2011-04-26    NA
3  2011-04-27     2
4  2011-04-28    NA
5  2011-04-29     3
6  2011-04-30    NA
7  2011-05-01     4
8  2011-05-02    NA
9  2011-05-03     5
10 2011-05-04    NA
11 2011-05-05     6
12 2011-05-06    NA
13 2011-05-07    NA
14 2011-05-08    NA
15 2011-05-09    NA

102

answered Oct 25 '22 13:10

Andrie

In the zoo package "regular" means that the series is equally spaced except possibly for some missing entries. The zooreg class in the zoo package is specifically for that type of series. Note that the set of all regular series includes the set of all equally spaced series but is strictly larger.

The is.regular function checks whether a given series is regular. That is, is the series amenable to making it equally spaced if one inserted NAs for the missing entries?

Regarding your last question, its a FAQ. See FAQ #13 in the zoo FAQ available from the zoo CRAN page or from within R via:

vignette("zoo-faq")

Also in FAQ #13 there is some illustrative code.

answered Oct 25 '22 12:10

G. Grothendieck

Related questions
                            
                                How to conditionally highlight points in ggplot2 facet plots - mapping color to column
                            
                                Logistic regression with robust clustered standard errors in R
                            
                                average between duplicated rows in R
                            
                                Test if all elements of a list (lists themselves) are equal
                            
                                How to calculate wind direction from U and V wind components in R
                            
                                Email dataframe as table in email body with SendMailR
                            
                                Collapsing rows by user with dplyr
                            
                                Vectorising a for loop containing a which statement and a function
                            
                                Select row by level of a factor
                            
                                Looping over dates with R
                            
                                Fastest way of determining most frequent factor in a grouped data frame in dplyr
                            
                                linear interpolate missing values in time series
                            
                                custom colors in R Plotly
                            
                                Can't loop with R's leaflet package to produce multiple maps
                            
                                set separator ';' in write.csv
                            
                                How do I create a new column based on multiple conditions from multiple columns?
                            
                                R ggplot2: Add means as horizontal line in a boxplot
                            
                                R: how to remove duplicate rows by column [duplicate]
                            
                                Numerical column in Excel gets converted as logical
                            
                                How to update GitHub authentification token on Rstudio to match the new policy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create "NA" for missing data in a time series

Tags:

r

missing-data

time-series

sbg

People also ask

2 Answers

Andrie

G. Grothendieck

Recent Activity

Donate For Us