Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HIVE - group by date function

Can anyone tell me why I'm not getting counts for each f0, MONTH, DAY, HOUR, MINUTE group in my result set?

Result set:

result set

Query:

SELECT t.f0, MONTH(TO_DATE(Hex2Dec(t.f2))), DAY(TO_DATE(Hex2Dec(t.f2))), HOUR(TO_DATE(Hex2Dec(t.f2))), MINUTE(TO_DATE(Hex2Dec(t.f2))), COUNT(DISTINCT t.f1)  
FROM table t  
WHERE (t.f0 = 1 OR t.f0 = 2)  
AND (t.f3 >= '2013-02-06' AND t.f3 < '2013-02-15')   
AND (Hex2Dec(t.f2) >= 1360195200 AND Hex2Dec(t.f2) < 1360800000)  
AND *EXTRA CONDITIONS*  
GROUP BY t.f0, MONTH(TO_DATE(Hex2Dec(t.f2))), DAY(TO_DATE(Hex2Dec(t.f2))),   HOUR(TO_DATE(Hex2Dec(t.f2))), MINUTE(TO_DATE(Hex2Dec(t.f2)))  

Schema:

f0 INT (Partition Column)
f1 INT
f2 STRING
f3 STRING (Partition Column)
f4 STRING
f5 STRING
f6 STRING
f7 MAP<STRING,STRING>

*f2 is a unix timestamp in Hexadecimal format

like image 768
Jubbles Avatar asked Feb 15 '13 20:02

Jubbles


1 Answers

This might be because to_date returns null when it's applied on a unix time.
According to the Hive manual:

to_date(string timestamp):
Returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"

Use from_unixtime instead to get back the correct date parts.

Note:
I assume Hex2Dec UDF is taken from the core library of HIVE-1545

like image 167
Lorand Bendig Avatar answered Sep 20 '22 12:09

Lorand Bendig