FIRST() or LAST() Aggregate Function in HIVE

Question

I am new to HIVE and SPARK.

Consider I have following query in SQL.

select col1, col2, min(col3), first(col4) from tablename group by col1, col2

As I dont want to include the col4 in group by I have taken first(col4) (But I want col4 to be displayed)

I want to write the same query in Hive, but in hive there is no first function.

Ref : https://docs.treasuredata.com/articles/hive-aggregate-functions
I want to write the same query in Spark SQL (Using dataframes). Similarly in spark aggregate functions there is no first function. (* The available aggregate methods are avg, max, min, sum, count. *)

Ref : org.apache.spark.sql.GroupedData

E.g. : val df1 = sqlContext.sql(" select * from TABLENAME").groupBy("COL1","COL2").agg("COL3" -> "MIN", "COL4" -> "????")

Is there any way to write the same with hive and spark ?

leftjoin · Accepted Answer

If you don't want to include col4 in group by and include it in select along with sum() , you need analytics functions:

select col1, 
       col2, 
       min(col3) over (partition by col1, col2) as min_col3,
       col4 
  from tablename;

See docs here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

David דודו Markovitz · Answer

select  col1
       ,col2
       ,s34.col3
       ,s34.col4 

from   (select      col1
                   ,col2
                   ,min(named_struct('col3',col3,'col4',col4)) as s34 

        from        tablename 

        group by    col1
                   ,col2
        ) t
;

or

select  col1
       ,col2
       ,s34.col1 as col3
       ,s34.col2 as col4 

from   (select      col1
                   ,col2
                   ,min(struct(col3,col4)) as s34 

        from        tablename 

        group by    col1
                   ,col2
        ) t
;

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

FIRST() or LAST() Aggregate Function in HIVE

Tags:

mysql

apache-spark

apache-spark-sql

hive

spark-dataframe

Sam

2 Answers

leftjoin

David דודו Markovitz

Recent Activity

Donate For Us

FIRST() or LAST() Aggregate Function in HIVE

Tags:

mysql

apache-spark

apache-spark-sql

hive

spark-dataframe

Sam

2 Answers

leftjoin

David דודו Markovitz

Related questions

Recent Activity

Donate For Us