Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TIMESTAMP format issue in HIVE

I have Hive table created from JSON file.

CREATE external TABLE logan_test.t1 (
   name string,
   start_time timestamp
   )
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
WITH SERDEPROPERTIES (
  "timestamp.formats" = "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"
)
LOCATION 's3://t1/';

My timestamp data is in the format of yyyy-MM-dd'T'HH:mm:ss.SSSSSS.

I specified SERDEPROPERTIES for timestamp format as given in the page. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-TimestampstimestampTimestamps

Create statement executed successfully But select * failed with following error.

HIVE_BAD_DATA: Error parsing field value '2017-06-01T17:51:15.180400' for field 1: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

like image 406
logan Avatar asked Jun 09 '17 22:06

logan


People also ask

What is the format of timestamp in hive?

The default date format of Hive is yyyy-MM-dd , and for Timestamp yyyy-MM-dd HH:mm:ss .

Is timestamp a datatype in hive?

ii) Date/Time Data TypeHive provides Timestamp and Date data types to UNIX timestamp format. TIMESTAMP- It uses nanosecond precision and is denoted by yyyy-mm-dd hh:mm: ss format.

How do I get a timestamp on my hive?

Solution. CURRENT_DATE will give the current date and CURRENT_TIMESTAMP will give you the date and time. If you want to work with EPOCH time then use unix_timestamp() to get the EPOCH time and use from_unixtime to convert EPOCH to date and time.

How do I query a timestamp column in hive?

You need to change the data type of your filter like Time-Stamp to String comparison can be the issue. Try using from_utc_timestamp('2017-01-01 22:30:57.375117') or from_unix() commands.


1 Answers

Jira HIVE-9298 in which timestamp.formats was introduced, says in the description that it is for LazySimpleSerDe. I did not find any other mention in the documentation that it was done for other SerDe.

The solution is to define timestamp as STRING and transform in the select.

Example for yyyy-MM-dd'T'HH:mm:ss.SSSSSS format:

select timestamp(regexp_replace(start_time, '^(.+?)T(.+?)','$1 $2'))

And this will work both for yyyy-MM-dd'T'HH:mm:ss.SSSSSS and yyyy-MM-dd HH:mm:ss.SSSSSS (normal timestamp) if there are both formats in data files.

timestamp(regexp_replace(start_time, '^(.+?)[T ](.+?)','$1 $2'))

Regex is powerful and you can parse different string formats using the same pattern.

like image 96
leftjoin Avatar answered Oct 24 '22 01:10

leftjoin