I have a log file which contains timestamp column. The timestamp is in unix epoch time format.
I want to create a partition based on a timestamp with partitions year, month and day.
So far I have done this but it is throwing an error.
PARSE ERROR cannot recognize input '(' in column type
Here is my code.
from ( from raw_data MAP ${PREFIX}raw_data.line USING 's3://scripts/clean.py' AS (timestamp STRING, name STRING) ) map_out INSERT OVERWRITE TABLE date_base_data_temp PARTITION(year(timestamp), month(timestamp)), day(timestamp))) select map_out.name;
Hive from_unixtime() is used to get Date and Timestamp in a default format yyyy-MM-dd HH:mm:ss from Unix epoch seconds. Specify the second argument in pattern format to return date and timestamp in a custom format.
The date functions are listed below. UNIX_TIMESTAMP() This function returns the number of seconds from the Unix epoch (1970-01-01 00:00:00 UTC) using the default time zone. UNIX_TIMESTAMP( string date ) This function converts the date in format 'yyyy-MM-dd HH:mm:ss' into Unix timestamp.
trunc(timestamp, str unit): This function is used to strip off all the given timestamp fields in string format. last_day(str date): This function is used to return the last day of the specified month in the given date in string format.
Solution. CURRENT_DATE will give the current date and CURRENT_TIMESTAMP will give you the date and time. If you want to work with EPOCH time then use unix_timestamp() to get the EPOCH time and use from_unixtime to convert EPOCH to date and time.
Oof, that looks ugly. Try using this function in Hive:
SELECT from_unixtime(unix_timestamp) as new_timestamp from raw_data ...
Or if timestamp is in ms
instead of seconds:
SELECT from_unixtime(unix_timestamp DIV 1000) as new_timestamp from raw_data ...
That converts a unix timestamp into a YYYY-MM-DD HH:MM:SS format, then you can use the following functions to get the year, month, and day:
SELECT year(new_timestamp) as year, month(new_timestamp) as month, day(new_timestamp) as day ...
With more recent releases of Hive and SparkSQL, data type of date and type casting options are available. Following should work in Hive as well as Spark SQL
SELECT cast(from_unixtime(epoch_datetime) as date) from myHiveTable
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With