Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define current time zone in Azure ML for strptime function, unknown timezone 'localtime'

All of these dates that I’ve manipulated in Execute R module in Azure Machine Learning write out as blank in the output – that is, these date columns exist, but there is no value in those columns.

The source variables which contain date information that I’m reading into the data frame have two different date formats. They are as follows:

usage$Date1=c(‘8/6/2015’   ‘8/20/2015’  ‘7/9/2015’)
usage$Date2=c(‘4/16/2015 0:00’,  ‘7/1/2015 0:00’, ‘7/1/2015 0:00’) 

I inspected the log file in AML, and AML can't find the local time zone. The log file warnings specifically: [ModuleOutput] 1: In strptime(x, format, tz = tz) : [ModuleOutput] unable to identify current timezone 'C': [ModuleOutput] please set environment variable 'TZ' [ModuleOutput] [ModuleOutput] 2: In strptime(x, format, tz = tz) : unknown timezone 'localtime'

I referred to another answer regarding setting default time zone for strptime here

unknown timezone name in R strptime/as.POSIXct

I changed my code to explicitly define the global environment time variable.

Sys.setenv(TZ='GMT')


####Data frame usage cleanup, format and labeling
usage<-as.data.frame(usage)
usage$Date1<-as.character(usage$Date1)
usage$Date1<-as.POSIXct(usage$Date1, "%m/%d/%Y",tz="GMT")
usage$Date1<-format(usage$Date1, "%m/%d/%Y")
usage$Date1<-as.Date(usage$Date1, "%m/%d/%Y")
usage<-as.data.frame(usage)

usage$Date2<- as.POSIXct(usage$Date2, "%m/%d/%Y",tz="GMT")
usage$Date2<- format(usage$Date2,"%m/%d/%Y")
usage$Date2<-as.Date(usage$Date2, "%m/%d/%Y")
usage<-as.data.frame(usage)

The problem persists -as a result AzureML does not write these variables out, rather writing out these columns as blanks.
(This code works in R studio, where I presume the local time is taken from the system.)

After reading two blog posts on this problem, it seems that Azure ML doesn't support some date time formats:

http://blogs.msdn.com/b/andreasderuiter/archive/2015/02/03/troubleshooting-error-1000-rpackage-library-exception-failed-to-convert-robject-to-dataset-when-running-r-scripts-in-azure-ml.aspx

http://www.mikelanzetta.com/2015/01/data-cleaning-with-azureml-and-r-dates/

So I tried to convert to POSIXct before sending it to the output stream, which I've done as follows: tenantusage$Date1 = as.POSIXct(tenantusage$Date1 , "%m/%d/%Y",tz = "EST5EDT"); tenantusage$Date2 = as.POSIXct(tenantusage$Date2 , "%m/%d/%Y",tz = "EST5EDT");

But encounter the same problem. The information in these variables refuses to write out to the output. Date1 and Date2 columns are blank.

Please advise!

thanks

like image 771
SingingData Avatar asked Oct 29 '15 00:10

SingingData


1 Answers

Hi SingingData and SochiX,

Sorry to hear about this source of frustration! I find that the following variation on SingingData's code sample works for me (tested in a CRAN 3.1.0 module):

usage <- data.frame(list(Date1 = c('8/6/2015',   '8/20/2015',  '7/9/2015'),
                         Date2 = c('4/16/2015 0:00',  '7/1/2015 0:00', '7/1/2015 0:00')))
usage$Date1 <- as.POSIXlt(usage$Date1, "%m/%d/%Y",tz="GMT")
usage$Date2 <- as.POSIXlt(usage$Date2, "%m/%d/%Y",tz="GMT")

usage$Date1 <- format(usage$Date1, "%m/%d/%Y")
usage$Date2 <- format(usage$Date2,"%m/%d/%Y")

usage$Date1 <- as.Date(usage$Date1, "%m/%d/%Y")
usage$Date2 <- as.Date(usage$Date2, "%m/%d/%Y")

maml.mapOutputPort("usage");

I've used as.POSIXlt() instead of as.POSIXct(). I hope that this helps unblock your work in R.

like image 188
mewahl Avatar answered Sep 29 '22 16:09

mewahl