Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get year and month when day is invalid without fixing the day myself?

Tags:

date

r

I have some data that looks a bit like this:

require(zoo)

X <- rbind(c(date='20111001', fmt='%Y%m%d'),
            c('20111031', '%Y%m%d'),
            c('201110', '%Y%m'),
            c('102011', '%m%Y'),
            c('31/10/2011', '%d/%m/%Y'),
            c('20111000', '%Y%m%d'))
print(X)

#      date       fmt     
# [1,] "20111001" "%Y%m%d"
# [2,] "20111031" "%Y%m%d"
# [3,] "201110"   "%Y%m"  
# [4,] "102011"   "%m%Y"  
# [5,] "31/10/2011" "%d/%m/%Y"
# [6,] "20111000" "%Y%m%d"

I only want the year and month. I don't need the day, so I'm not worried that the final day is invalid. R, unfortunately, is:

mapply(as.yearmon, X[, 'date'], X[, 'fmt'], SIMPLIFY=FALSE)

# $`20111001`
# [1] "Oct 2011"

# $`20111031`
# [1] "Oct 2011"

# $`201110`
# [1] "Oct 2011"

# $`102011`
# [1] "Oct 2011"

# $`31/10/2011`
# [1] "Oct 2011"

# $`20111000`
# Error in charToDate(x) : 
#   character string is not in a standard unambiguous format

I know that the usual answer is to fix the day part of the date, e.g. using paste(x, '01', sep=''). I don't think that will work here, because I don't know in advance what the date format will be, and therefore I cannot set the day without converting to some sort of date object first.

like image 915
pete Avatar asked Feb 23 '23 18:02

pete


2 Answers

Assuming the month always follows the year and is always two characters in your date. Why not just extract the information with substr. Perhaps something like:

lapply(X[,'date'], 
  function(x) paste(month.abb[as.numeric(substr(x, 5, 6))], substr(x, 1, 4))
  )
like image 59
wkmor1 Avatar answered Feb 25 '23 12:02

wkmor1


You don't need to specify the day in your format if you don't need it. Read ?strptime carefully. The second paragraph in the Details section says:

Each input string is processed as far as necessary for the format specified: any trailing characters are ignored.

So adjust your format and everything should work.

X <- rbind(c(date='20111001', fmt='%Y%m'),
           c('20111031', '%Y%m'),
           c('201110',   '%Y%m'),
           c('102011',   '%m%Y'),
           c('20111000', '%Y%m'))
mapply(as.yearmon, X[, 'date'], X[, 'fmt'], SIMPLIFY=FALSE)
like image 25
Joshua Ulrich Avatar answered Feb 25 '23 11:02

Joshua Ulrich