Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Date comparison in Hive

I'm working with Hive and I have a table structured as follows:

CREATE TABLE t1 (
  id INT,
  created TIMESTAMP,
  some_value BIGINT
);

I need to find every row in t1 that is less than 180 days old. The following query yields no rows even though there is data present in the table that matches the search predicate.

select * 
from t1 
where created > date_sub(from_unixtime(unix_timestamp()), 180);

What is the appropriate way to perform a date comparison in Hive?

like image 737
Jeremiah Peschka Avatar asked Dec 28 '12 15:12

Jeremiah Peschka


People also ask

How do you find the difference in dates in Hive?

If you need the difference in seconds (i.e.: you're comparing dates with timestamps, and not whole days), you can simply convert two date or timestamp strings in the format 'YYYY-MM-DD HH:MM:SS' (or specify your string date format explicitly) using unix_timestamp(), and then subtract them from each other to get the ...

How do I query a date in Hive?

Hive Date and Timestamp functions are used to manipulate Date and Time on HiveQL queries over Hive CLI, Beeline, and many more applications Hive supports. The default date format of Hive is yyyy-MM-dd , and for Timestamp yyyy-MM-dd HH:mm:ss .

What is To_date in Hive?

Returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01" Returns the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970.

How do I change the date format in Hive?

In Hive, you are able to choose your preferred date format and calendar layout. To set your date preferences, navigate to your profile dropdown > 'My profile' > 'Edit profile'. 'Date Format' will allow you to change from US (MM/DD/YY) to international format (DD/MM/YY).


2 Answers

How about:

where unix_timestamp() - created < 180 * 24 * 60 * 60

Date math is usually simplest if you can just do it with the actual timestamp values.

Or do you want it to only cut off on whole days? Then I think the problem is with how you are converting back and forth between ints and strings. Try:

where created > unix_timestamp(date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),180),'yyyy-MM-dd')

Walking through each UDF:

  1. unix_timestamp() returns an int: current time in seconds since epoch
  2. from_unixtime(,'yyyy-MM-dd') converts to a string of the given format, e.g. '2012-12-28'
  3. date_sub(,180) subtracts 180 days from that string, and returns a new string in the same format.
  4. unix_timestamp(,'yyyy-MM-dd') converts that string back to an int

If that's all getting too hairy, you can always write a UDF to do it yourself.

like image 199
Joe K Avatar answered Oct 14 '22 05:10

Joe K


Alternatively you may also use datediff. Then the where clause would be
in case of String timestamp (jdbc format) :

datediff(from_unixtime(unix_timestamp()), created) < 180;

in case of Unix epoch time:

datediff(from_unixtime(unix_timestamp()), from_unixtime(created)) < 180;
like image 45
Lorand Bendig Avatar answered Oct 14 '22 04:10

Lorand Bendig