Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive: More clean way to SELECT AS and GROUP BY

I try to write Hive Sql like that

SELECT count(1), substr(date, 1, 4) as year
FROM ***
GROUP BY year

But Hive cannot recognize the alias name 'year', it complains that: FAILED: SemanticException [Error 10004]: Line 1:79 Invalid table alias or column reference 'year'

One solution(Hive: SELECT AS and GROUP BY) suggest to use 'GROUP BY substr(date, 1, 4)'.

It works! However in some cases the value I want to group by may be generated from multiple lines of hive function code, it's very ugly to write code like

SELECT count(1), func1(func2(..........................)) AS something
FROM ***
GROUP BY func1(func2(..........................))

Is there any clean way in Hive to do that? Any suggestions?

like image 853
twds Avatar asked Apr 04 '15 05:04

twds


People also ask

Does Hive prefer normalization?

Hive's massive parallelism eliminates many of the disk-I/O limitations of an RDBMS, reducing the value of normalization for reducing data volume. Hive is often used with data volumes for which it would impractical to use normalized data.

Does GROUP BY work in Hive?

Group by query:Group by clause use columns on Hive tables for grouping particular column values mentioned with the group by. For whatever the column name we are defining a “groupby” clause the query will selects and display results by grouping the particular column values.

How do I select only few rows in Hive?

Solution. Order the records first and then apply the LIMIT clause to limit the number of records.


2 Answers

In Hive 0.11.0 and later, columns can be specified by position if hive.groupby.orderby.position.alias is set to true (the default is false). So setting set hive.groupby.orderby.position.alias=true; in your .hql (or .hiverc for a permanent solution) will do the trick and then you can type group by 2 for the above example. Source: hive language manual

like image 137
Angelo Di Donato Avatar answered Dec 24 '22 03:12

Angelo Di Donato


Specifying the position in Group By will solve your issue. This position number in Group By works even when SET hive.groupby.orderby.position.alias=false; (Hive 0.12)

SELECT count(1), substr(date, 1, 4) as year  
FROM ***
GROUP BY 2;
like image 22
Partha Kaushik Avatar answered Dec 24 '22 04:12

Partha Kaushik