Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill in missing year in ordered list of dates

I have collected some time series data from the web and the timestamp that I got looks like below.

24 Jun 
21 Mar
20 Jan 
10 Dec
20 Jun 
20 Jan
10 Dec 
...

The interesting part is that the year is missing in the data, however, all the records are ordered, and you can infer the year from the record and fill in the missing data. So the data after imputing should be like this:

24 Jun 2014
21 Mar 2014
20 Jan 2014
10 Dec 2013 
20 Jun 2013
20 Jan 2013
10 Dec 2012
...

Before lifting my sleeves and start writing a for loop with nested logic.. is there a easy way that might work out of box in R to impute the missing year.

Thanks a lot for any suggestion!

like image 736
B.Mr.W. Avatar asked Sep 02 '14 21:09

B.Mr.W.


1 Answers

Here's one idea

## Make data easily reproducible
df <- data.frame(day=c(24, 21, 20, 10, 20, 20, 10),
                 month = c("Jun", "Mar", "Jan", "Dec", "Jun", "Jan", "Dec"))


## Convert each month-day combo to its corresponding "julian date"
datestring <- paste("2012", match(df[[2]], month.abb), df[[1]], sep = "-")
date <- strptime(datestring, format = "%Y-%m-%d") 
julian <- as.integer(strftime(date, format = "%j"))

## Transitions between years occur wherever julian date increases between
## two observations
df$year <- 2014 - cumsum(diff(c(julian[1], julian))>0)

## Check that it worked
df
#   day month year
# 1  24   Jun 2014
# 2  21   Mar 2014
# 3  20   Jan 2014
# 4  10   Dec 2013
# 5  20   Jun 2013
# 6  20   Jan 2013
# 7  10   Dec 2012
like image 57
Josh O'Brien Avatar answered Nov 10 '22 08:11

Josh O'Brien