Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strptime, as.POSIXct and as.Date return unexpected NA

When I try to parse a timestamp in the following format: "Thu Nov 8 15:41:45 2012", only NA is returned.

I am using Mac OS X, R 2.15.2 and Rstudio 0.97.237. The language of my OS is Dutch: I presume this has something to do with it.

When I try strptime, NA is returned:

var <- "Thu Nov 8 15:41:45 2012"
strptime(var, "%a %b %d %H:%M:%S %Y")
# [1] NA

Neither does as.POSIXct work:

as.POSIXct(var, "%a %b %d %H:%M:%S %Y")
# [1] NA

I also tried as.Date on the string above but without %H:%M:%S components:

as.Date("Thu Nov 8 2012", "%a %b %d %Y")
# [1] NA

Any ideas what I could be doing wrong?

like image 959
Hemmik Avatar asked Dec 05 '12 15:12

Hemmik


People also ask

What does Strptime mean in R?

Definitions: The strptime function converts characters to time objects. The strftime function converts time objects to characters.

What is POSIXct format?

POSIXct stores date and time in seconds with the number of seconds beginning at 1 January 1970. Negative numbers are used to store dates prior to 1970. Thus, the POSIXct format stores each date and time a single value in units of seconds. Storing the data this way, optimizes use in data.

What is POSIXct and POSIXlt?

The POSIXct class stores date/time values as the number of seconds since January 1, 1970, while the POSIXlt class stores them as a list with elements for second, minute, hour, day, month, and year, among others.


2 Answers

I think it is exactly as you guessed, strptime fails to parse your date-time string because of your locales. Your string contains both abbreviated weekday (%a) and abbreviated month name (%b). These time specifications are described in ?strptime:

Details

%a: Abbreviated weekday name in the current locale on this platform

%b: Abbreviated month name in the current locale on this platform.

"Note that abbreviated names are platform-specific (although the standards specify that in the C locale they must be the first three letters of the capitalized English name:"

"Knowing what the abbreviations are is essential if you wish to use %a, %b or %h as part of an input format: see the examples for how to check."

See also

[...] locales to query or set a locale.

The issue of locales is relevant also for as.POSIXct, as.POSIXlt and as.Date.

From ?as.POSIXct:

Details

If format is specified, remember that some of the format specifications are locale-specific, and you may need to set the LC_TIME category appropriately via Sys.setlocale. This most often affects the use of %b, %B (month names) and %p (AM/PM).

From ?as.Date:

Details

Locale-specific conversions to and from character strings are used where appropriate and available. This affects the names of the days and months.


Thus, if weekdays and month names in the string differ from those in the current locale, strptime, as.POSIXct and as.Date fail to parse the string correctly and NA is returned.

However, you may solve this issue by changing the locales:

# First save your current locale
loc <- Sys.getlocale("LC_TIME")

# Set correct locale for the strings to be parsed
# (in this particular case: English)
# so that weekdays (e.g "Thu") and abbreviated month (e.g "Nov") are recognized
Sys.setlocale("LC_TIME", "en_GB.UTF-8")
# or
Sys.setlocale("LC_TIME", "C") 

#Then proceed as you intended
x <- "Thu Nov 8 15:41:45 2012" 
strptime(x, "%a %b %d %H:%M:%S %Y")
# [1] "2012-11-08 15:41:45"

# Then set back to your old locale
Sys.setlocale("LC_TIME", loc) 

With my personal locale I can reproduce your error:

Sys.setlocale("LC_TIME", loc)
# [1] "fr_FR.UTF-8"

strptime(var,"%a %b %d %H:%M:%S %Y")
# [1] NA
like image 132
plannapus Avatar answered Oct 31 '22 17:10

plannapus


Was just messing around with same problem, and found this solution to be much cleaner because there is no need to change any of system settings manually, because there is a wrapper function doing this job in the lubridate package, and all you have to do is set the argument locale:

date <- c("23. juni 2014", "1. november 2014", "8. marts 2014", "16. juni 2014", "12. december 2014", "13. august 2014")
df$date <- dmy(df$Date, locale = "Danish")
[1] "2014-06-23" "2014-11-01" "2014-03-08" "2014-06-16" "2014-12-12" "2014-08-13"
like image 21
m_c Avatar answered Oct 31 '22 16:10

m_c