I try to write Hive Sql like that <pre class="prettyprint"><code>SELECT count(1), substr(date, 1, 4) as year FROM *** GROUP BY year </code></pre> But Hive cannot recognize the alias name 'year', it complains that: FAILED: SemanticException [Error 10004]: Line 1:79 Invalid table alias or column reference 'year' One solution(Hive: SELECT AS and GROUP BY) suggest to use 'GROUP BY substr(date, 1, 4)'. It works! However in some cases the value I want to group by may be generated from multiple lines of hive function code, it's very ugly to write code like <pre class="prettyprint"><code>SELECT count(1), func1(func2(..........................)) AS something FROM *** GROUP BY func1(func2(..........................)) </code></pre> Is there any clean way in Hive to do that? Any suggestions?

In Hive 0.11.0 and later, columns can be specified by position if hive.groupby.orderby.position.alias is set to true (the default is false). So setting <code>set hive.groupby.orderby.position.alias=true;</code> in your .hql (or .hiverc for a permanent solution) will do the trick and then you can type <code>group by 2</code> for the above example. Source: hive language manual

Hive: More clean way to SELECT AS and GROUP BY

Tags:

hadoop

hive

hiveql

I try to write Hive Sql like that

SELECT count(1), substr(date, 1, 4) as year
FROM ***
GROUP BY year

But Hive cannot recognize the alias name 'year', it complains that: FAILED: SemanticException [Error 10004]: Line 1:79 Invalid table alias or column reference 'year'

One solution(Hive: SELECT AS and GROUP BY) suggest to use 'GROUP BY substr(date, 1, 4)'.

It works! However in some cases the value I want to group by may be generated from multiple lines of hive function code, it's very ugly to write code like

SELECT count(1), func1(func2(..........................)) AS something
FROM ***
GROUP BY func1(func2(..........................))

Is there any clean way in Hive to do that? Any suggestions?

853

asked Apr 04 '15 05:04

twds

2 Answers

In Hive 0.11.0 and later, columns can be specified by position if hive.groupby.orderby.position.alias is set to true (the default is false). So setting set hive.groupby.orderby.position.alias=true; in your .hql (or .hiverc for a permanent solution) will do the trick and then you can type group by 2 for the above example. Source: hive language manual

137

answered Dec 24 '22 03:12

Angelo Di Donato

Specifying the position in Group By will solve your issue. This position number in Group By works even when SET hive.groupby.orderby.position.alias=false; (Hive 0.12)

SELECT count(1), substr(date, 1, 4) as year  
FROM ***
GROUP BY 2;

answered Dec 24 '22 04:12

Partha Kaushik

Related questions
                            
                                Get the last updated file in HDFS
                            
                                What is version library spark supported SparkSession
                            
                                How to recursively read Hadoop files from directory using Spark?
                            
                                What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive?
                            
                                How to change queue of currently running hadoop job?
                            
                                Hadoop YARN vs Yarn package manager command conflict
                            
                                What is the maximum number of files allowed in a HDFS directory?
                            
                                why Hadoop is not a real-time platform
                            
                                Hive: Sum over a specified group (HiveQL)
                            
                                Search a table in all databases in hive
                            
                                copying directory from local system to hdfs java code
                            
                                using PIG to load a file
                            
                                HDFS from Java - Specifying the User
                            
                                Mapreduce Combiner
                            
                                HBase Scan Performance
                            
                                How to copy and convert parquet files to csv
                            
                                Problem with -libjars in hadoop
                            
                                In hive, is there a way to specify between which columns to add a new column to?
                            
                                how to find file from blockName in HDFS hadoop
                            
                                How can I get Zeppelin to restart cleanly on an EMR cluster?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With