Hi i trying to find out what the difference between using a TRUNC and TO_DATE is in Hive.
Currently within oracle I wrote the following case statement against the Data shown below:
ORDER_NO | NAME | DATE_ | TASK_NO
ABC123 | Humpty | 07-OCT-16 12:30:54 | 1
ABC123 | Humpty | 07-OCT-16 12:30:54 | 2
ABC123 | Humpty | 07-OCT-16 12:32:20 | 6
SELECT ORDER_NO, NAME, DATE_, TASK_NO
(CASE WHEN DATE_ - LAG(DATE_) OVER (PARTITION BY ORDER_NO, NAME, TRUNC(DATE_) ORDER BY DATE_) <= 1/48
THEN 0 ELSE 1 END) AS COUNT1
and this gives me the result:
ORDER_NO | NAME | DATE_ | TASK_NO | COUNT1
ABC123 | Humpty | 07-OCT-16 12:30:54 | 1 | 1
ABC123 | Humpty | 07-OCT-16 12:30:54 | 2 | 0
ABC123 | Humpty | 07-OCT-16 12:32:20 | 6 | 1
which is correct. Now if i use the same query in Hive against my full data set I get an error message:
Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns.
So I changed the TRUNC to TO_DATE and this works and gives me the following results:
SELECT ORDER_NO, NAME, DATE_, TASK_NO
(CASE WHEN DATE_ - LAG(DATE_) OVER (PARTITION BY ORDER_NO, NAME, TO_DATE(DATE_) ORDER BY DATE_) <= 1/48
THEN 0 ELSE 1 END) AS COUNT1
and this gives me the result:
ORDER_NO | NAME | DATE_ | TASK_NO | COUNT1
ABC123 | Humpty | 07-OCT-16 12:30:54 | 1 | 1
ABC123 | Humpty | 07-OCT-16 12:32:20 | 6 | 1
ABC123 | Humpty | 07-OCT-16 12:30:54 | 2 | 1
which is different to what I get in Oracle. From what I can gather the date value is stored as a string as the results arent ordered in Date/Time and this is where I think the problem lies but not sure what changes I need to make to fix it.
Would really appreciate some advice.
UPDATED CODE:
SELECT
ORDER_NO
,NAME
,DATE_FIXED
,TASK_NO
,CASE WHEN DATE_UTS - LAG(DATE_UTS) OVER (PARTITION BY ORDER_NO, NAME, TO_DATE(DATE_FIXED) ORDER BY DATE_FIXED) <= 60*30
THEN 0
ELSE 1
END AS COUNT1
FROM
(
SELECT
ORDER_NO
,NAME
,TASK_NO
,FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_, 'DD-MMM-YY HH:MM:SS')) AS DATE_FIXED
,UNIX_TIMESTAMP(DATE_, 'DD-MMM-YY HH:MM:SS') AS DATE_UTS
FROM TABLE1
) T
Hive Operators and User-Defined Functions (UDFs)
to_date
Returns the date part of a timestamp string (pre-Hive 2.1.0):
to_date("1970-01-01 00:00:00") = "1970-01-01".
As of Hive 2.1.0, returns a date object.
Prior to Hive 2.1.0 (HIVE-13248) the return type was a String because no Date type existed when the method was created.
trunc
Returns date truncated to the unit specified by the format (as of Hive 1.2.0).
Supported formats: MONTH/MON/MM, YEAR/YYYY/YY.
Example: trunc('2015-03-17', 'MM') = 2015-03-01.
You have errors in your original query
TASK_NO
and (CASE WHEN
Trunc
in Hive must take 1 parameter, and there is no parameter for day.The only recognize date format in Hive is YYYY-MM-DD which does not match your data.
Applying date functions on invalid string result in NULL.
This is how you convert your data format to dates:
hive> select from_unixtime(unix_timestamp('07-OCT-16 12:30:54','dd-MMM-yy HH:mm:ss'));
OK
2016-10-07 12:30:54
and the whole query:
select ORDER_NO
,NAME
,DATE_fixed
,TASK_NO
,case
when DATE_uts
- LAG(DATE_uts) OVER
(
PARTITION BY ORDER_NO,NAME,to_date(DATE_fixed)
ORDER BY DATE_fixed
)
<= 60*30
then 0
else 1
end AS COUNT1
from (select ORDER_NO
,NAME
,TASK_NO
,from_unixtime(unix_timestamp(DATE_,'dd-MMM-yy HH:mm:ss')) as DATE_fixed
,unix_timestamp(DATE_,'dd-MMM-yy HH:mm:ss') as DATE_uts
from t
) t
;
ABC123 Humpty 2016-10-07 12:30:54 2 1
ABC123 Humpty 2016-10-07 12:30:54 1 0
ABC123 Humpty 2016-10-07 12:32:20 6 0
This were also the results when I tested it on Oracle
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With