Hive: UDF and GROUP BY

Tags:

hive

hiveql

I have a UDF (GetUrlExt) that returns extension. (ex: jpg in /abc/models/xyz/images/top.jpg). The data is like shown below:

Date Time TimeTaken uristem  
9/5/2011 0:00:10 234 /abc/models/xyz/images/top.jpg  
9/5/2011 0:00:11 456 /abc/models/xyz/images/bottom.jpg  
9/5/2011 0:00:14 789 /abc/models/xyz/images/left.gif  
9/5/2011 0:00:16 234 /abc/models/xyz/images/top.pdf  
9/5/2011 0:00:18 734 /abc/models/xyz/images/top.pdf  
9/5/2011 0:00:19 654 /abc/models/xyz/images/right.gif  
9/5/2011 0:00:21 346 /abc/models/xyz/images/top.pdf  
9/5/2011 0:00:24 556 /abc/models/xyz/images/front.pdf  
9/5/2011 0:00:26 134 /abc/models/xyz/images/back.jpg

The query without 'GROUP BY' is working fine:

SELECT GetUrlExt(uristem) AS extn FROM LogTable;

Result: jpg jpg gif pdf pdf gif pdf pdf jpg

Now I need 'GROUP BY' on the results of the GetUrlExt UDF.
Expected Result:
jpg 3 274.6
gif 2 721.5
pdf 4 467.5

But the following query is not working:

SELECT GetUrlExt(uristem) AS extn, Count(*) AS PerCount, Avg(TimeTaken) AS AvgTime FROM LogTable GROUP BY extn;

Any kind of help is appreciated!

424

asked Nov 20 '12 09:11

Srinivas

1 Answers

Pls use subquery to group by.

Hive doesn't support group by calculated value directly.

SELECT a.extn, Count(*) AS PerCount, Avg(TimeTaken) AS AvgTime 
FROM
(
    SELECT GetUrlExt(uristem) AS extn, TimeTaken
    FROM LogTable 
) a
GROUP BY a.extn;

186

answered Sep 24 '22 05:09

pensz

Related questions
                            
                                how to format int number output with thousand separator in hive sql
                            
                                Spark dataframe column naming conventions / restrictions
                            
                                How to connect spark with hive using pyspark?
                            
                                Presto SQL - How can i get all possible combination of an array?
                            
                                How to load json snappy compressed in HIVE
                            
                                In Hive, does "Load data local inpath" overwrite existing data or append?
                            
                                Is something written to HDFS or Hbase visible to all other nodes in Hadoop Cluster immediately?
                            
                                Joining two Tables in Hive using HiveQL(Hadoop) [duplicate]
                            
                                Using Hive for real time queries
                            
                                split function does not work in Cloudera Impala
                            
                                hive: Using collect_set with a delimiter
                            
                                How can I change column comments in existing Hive table without including new column name and type?
                            
                                Hadoop Hive unable to move source to destination
                            
                                Hadoop - Create external table from multiple directories in HDFS
                            
                                Convert varchar to hexadecimal in sql server
                            
                                Apache Hive: How to convert string to timestamp?
                            
                                HOW CTE (Common Table Expression) in HIVE gets evaluated
                            
                                Conversion Hive datediff() to months
                            
                                Cannot use a "." in a Hive table column name
                            
                                Streaming data store in hive using spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With