Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Hive ntile results in where clause

I want to get summary data of the first quartile for a table in Hive. Below is a query to get the maximum number of views in each quartile:

SELECT NTILE(4) OVER (ORDER BY total_views) AS quartile, MAX(total_views)
FROM view_data
GROUP BY quartile
ORDER BY quartile;

And this query is to get the names of all the people that are in the first quartile:

SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data
WHERE quartile = 1

I get this error for both queries:

Invalid table alias or column reference 'quartile'

How can I reference the ntile results in the where clause or group by clause?

like image 857
Nadine Avatar asked Jul 21 '15 13:07

Nadine


People also ask

What does NTILE function do?

The NTILE window function divides ordered rows in the partition into the specified number of ranked groups of as equal size as possible and returns the group that a given row falls into.

What is Ntile in hive?

NTILE is an analytic function. It divides an ordered data set into a number of buckets indicated by expr and assigns the appropriate bucket number to each row. The buckets are numbered 1 through expr .

When to use NTILE in SQL?

NTILE() in Standard Query Language (SQL) is a window function that is used to divide sorted rows of a partition into a specified number of equal size buckets or groups. Each bucket is assigned a rank starting from 1. Each row in the partition is assigned a bucket number based on the group to which it belongs.

What is lead function in hive?

LEAD function:The number (value_expr) of rows to lead can optionally be specified. If the number of rows (offset) to lead is not specified, the lead is one row by default. It returns [,default] or null when the default is not specified and the lead for the current row extends beyond the end of the window.


1 Answers

You can't put a windowing function in a where clause because it would create ambiguity if there are compound predicates. So use a subquery.

select quartile, max(total_views) from
(SELECT total_views, NTILE(4) OVER (ORDER BY total_views) AS quartile,
FROM view_data) t
GROUP BY quartile
ORDER BY quartile
;

and

select * from 
(SELECT name, NTILE(4) OVER (ORDER BY total_views) AS quartile
FROM view_data) t
WHERE quartile = 1
;
like image 171
invoketheshell Avatar answered Nov 03 '22 10:11

invoketheshell