Can anyone tell me why I'm not getting counts for each f0, MONTH, DAY, HOUR, MINUTE group in my result set?
Result set:
Query:
SELECT t.f0, MONTH(TO_DATE(Hex2Dec(t.f2))), DAY(TO_DATE(Hex2Dec(t.f2))), HOUR(TO_DATE(Hex2Dec(t.f2))), MINUTE(TO_DATE(Hex2Dec(t.f2))), COUNT(DISTINCT t.f1)
FROM table t
WHERE (t.f0 = 1 OR t.f0 = 2)
AND (t.f3 >= '2013-02-06' AND t.f3 < '2013-02-15')
AND (Hex2Dec(t.f2) >= 1360195200 AND Hex2Dec(t.f2) < 1360800000)
AND *EXTRA CONDITIONS*
GROUP BY t.f0, MONTH(TO_DATE(Hex2Dec(t.f2))), DAY(TO_DATE(Hex2Dec(t.f2))), HOUR(TO_DATE(Hex2Dec(t.f2))), MINUTE(TO_DATE(Hex2Dec(t.f2)))
Schema:
f0 INT (Partition Column)
f1 INT
f2 STRING
f3 STRING (Partition Column)
f4 STRING
f5 STRING
f6 STRING
f7 MAP<STRING,STRING>
*f2 is a unix timestamp in Hexadecimal format
This might be because to_date returns null when it's applied on a unix time.
According to the Hive manual:
to_date(string timestamp):
Returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"
Use from_unixtime instead to get back the correct date parts.
Note:
I assume Hex2Dec UDF is taken from the core library of HIVE-1545
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With