Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in ToDate function in Pig

I have datetime data in my input and would like to load it correctly from Pig. I googled and learned it's suggested to load as chararray then covert to datetime with ToDate function. However, the same script works for one input but not another, which have the identical data format. My pig version is 0.12.1. The script I'm using:

A = load '/user/ss/debug/debug' using PigStorage(',') as (AUDIT:chararray,JOB:chararray,TYPE:chararray,ID:long,STATUS_ID:long,POOL_NAME:chararray,SLA_PRIORITY:long,STATUS:chararray,RUN_ID:long,TASK:chararray,SCENARIO_ID:long,CREDIT_CNT:long,COMM_CNT:long,BONUS_CNT:long,PAYMENT_CNT:long,RUN_TIME:long,START_TIME:chararray,END_TIME:chararray,ITEM_COUNT:long); 

B = foreach A generate JOB, TYPE, ID, CREDIT_CNT, COMM_CNT, BONUS_CNT, PAYMENT_CNT, ToDate(START_TIME, 'yyyy-MM-dd HH:mm:ss') as (START_TIME_DT:datetime), ToDate(END_TIME, 'yyyy-MM-dd HH:mm:ss') as (END_TIME_DT:datetime), START_TIME, END_TIME, ITEM_COUNT; 

dump B;

The data looks like following:

Input that reports errors:

D789FD70FE9E3ABBE0432165880A09E1,D789FD70FE9D3ABBE0432165880A09E1,VA,123,4946586,DEFAULT,1,Completed,,DD13,,0,0,0,0,0,2013-03-10 02:41:14,2013-03-10 02:41:16,0

Input that run correctly:

C888E618A7740A71E0432165880ABCA3,C888E618A7730A71E0432165880ABCA3,VA,123,4680120,DEFAULT,1,Completed,,DD12,,0,0,0,0,0,2012-08-31 04:16:56,2012-08-31 04:17:02,0
C888FC5DA4B212F3E0432165880A3C34,C888FC5DA4B112F3E0432165880A3C34,VA,123,4680125,DEFAULT,1,Completed,,DD12,,0,0,0,0,0,2012-08-31 04:17:51,2012-08-31 04:17:57,0
C888FC5DA4B912F3E0432165880A3C34,C888FC5DA4B812F3E0432165880A3C34,VA,123,4680127,DEFAULT,1,Completed,,DD14,,0,0,0,0,0,2012-08-31 04:18:17,2012-08-31 04:18:22,0

I don't understand why the identical input schema and scripts can have different results. The error says "Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)".

The error log looks like following:

Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-120 Operator Key: scope-120) children: null at []]: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:707)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:352)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:336)
	at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:672)
	at org.apache.pig.builtin.ToDate2ARGS.exec(ToDate2ARGS.java:45)
	at org.apache.pig.builtin.ToDate2ARGS.exec(ToDate2ARGS.java:33)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDateTime(POUserFunc.java:422)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:329)
	... 13 more

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias C. Backend error : Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-120 Operator Key: scope-120) children: null at []]: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C. Backend error : Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-120 Operator Key: scope-120) children: null at []]: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.apache.pig.PigServer.openIterator(PigServer.java:870)
	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
	at org.apache.pig.Main.run(Main.java:541)
	at org.apache.pig.Main.main(Main.java:156)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-120 Operator Key: scope-120) children: null at []]: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:707)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:352)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.IllegalArgumentException: Cannot parse "2013-03-10 02:41:14": Illegal instant due to time zone offset transition (America/Los_Angeles)
	at org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:336)
	at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:672)
	at org.apache.pig.builtin.ToDate2ARGS.exec(ToDate2ARGS.java:45)
	at org.apache.pig.builtin.ToDate2ARGS.exec(ToDate2ARGS.java:33)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDateTime(POUserFunc.java:422)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:329)

Any help or suggestion will be highly appreciated. Thanks a lot!

like image 802
user2830451 Avatar asked May 26 '26 07:05

user2830451


1 Answers

Its looks like the datetime "2013-03-10 02:41:14" doesn't exist in 'America/Los_Angeles' timezone. This may due to day light saving time in US. The same inputs are working fine in my time zone, so to solve this issue you need to specfiy the timezone 'America/Los_Angeles' as third argument in the ToDate function.

Can you change the ToDate function like this?

ToDate(START_TIME, 'yyyy-MM-dd HH:mm:ss','America/Los_Angeles') 
like image 67
Sivasakthi Jayaraman Avatar answered Jun 01 '26 13:06

Sivasakthi Jayaraman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!