Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get row with max value in Hive/SQL?

Tags:

sql

hive

I'm new to Hive/SQL, and I'm stuck on a fairly simple problem. My data looks like:

+------------+--------------------+-----------------------+
| carrier_iD |     meandelay      |     meancanceled      |
+------------+--------------------+-----------------------+
| EV         | 13.795802119653473 | 0.028584251044292006  |
| VX         | 0.450591016548463  | 2.364066193853424E-4  |
| F9         | 10.898001378359766 | 0.00206753962784287   |
| AS         | 0.5071547420965062 | 0.0057404326123128135 |
| HA         | 1.2031093279839498 | 5.015045135406214E-4  |
| 9E         | 8.147899230704216  | 0.03876067292247866   |
| B6         | 9.45383857757506   | 0.003162096314343487  |
| UA         | 8.101511665305816  | 0.005467725574605967  |
| FL         | 0.7265068895709532 | 0.0041141513746490044 |
| WN         | 7.156119279121648  | 0.0057419058192869415 |
| DL         | 4.206288692245839  | 0.005123990066804269  |
| YV         | 6.316802855264404  | 0.029304029304029346  |
| US         | 3.2221527095063736 | 0.007984031936127766  |
| OO         | 6.954715814690328  | 0.02596499362466706   |
| MQ         | 9.74568222216328   | 0.025628100708354324  |
| AA         | 8.720522654298968  | 0.019242775597574157  |
+------------+--------------------+-----------------------+

I want Hive to return the row with the meanDelay max value. I have:

SELECT CAST(MAX(meandelay) as FLOAT) FROM flightinfo;

which indeed returns the max (I use cast because my values are saved as STRING). So then:

SELECT * FROM flightinfo WHERE meandelay = (SELECT CAST(MAX(meandelay) AS FLOAT) FROM flightinfo);

I get the following error:

FAILED: ParseException line 1:44 cannot recognize input near 'select' 'cast' '(' in expression specification
like image 507
marc Avatar asked Dec 17 '13 19:12

marc


People also ask

How do you SELECT a row with maximum value?

We used the MAX() function within a subquery to find the maximum value, and returned the whole row with the outer query.

How do I find the maximum value of a row in SQL?

To find the maximum value of a column, use the MAX() aggregate function; it takes a column name or an expression to find the maximum value. In our example, the subquery returns the highest number in the column grade (subquery: SELECT MAX(grade) FROM student ).

What is Max () in hive?

The MAX function accepts just one argument, but the GREATEST function requires at least two. The MAX function calls the mapreduce engine, while the GREATEST function does not. If extra columns, in addition to the column given as an input for the MAX function, must be grouped.


2 Answers

Use the windowing and analytics functions

SELECT carrier_id, meandelay, meancanceled
FROM
 (SELECT carrier_id, meandelay, meancanceled,
         rank() over (order by cast(meandelay as float) desc) as r 
  FROM table) S 
WHERE S.r = 1;

This will also solve the problem if more than one row has the same max value, you'll get all the rows as result. If you just want a single row change rank() to row_number() or add another term to the order by.

like image 77
libjack Avatar answered Oct 11 '22 00:10

libjack


use join instead.

SELECT a.* FROM flightinfo a left semi join  
(SELECT CAST(MAX(meandelay) AS FLOAT)  
maxdelay FROM flightinfo)b on (a.meandelay=b.maxdelay)
like image 22
dimamah Avatar answered Oct 10 '22 23:10

dimamah