Hive - how to get the quantile on values per group

Tags:

hadoop

hive

How can I calculate the quantile (ntile, or percentile) for a value, for each group of rows of the same item?

I would like to know for item '101', considering only the rows where 'p' is 1, which is the value needed to be in the top 25% for example.

create table t1
(item INT,
p INT,
value FLOAT
);

insert into t1 values ('101', '1', '.5');
insert into t1 values ('101', '2', '.4');
insert into t1 values ('101', '1', '.6');
insert into t1 values ('101', '2', '.2');
insert into t1 values ('101', '1', '.7');
insert into t1 values ('101', '2', '.3');
insert into t1 values ('102', '1', '1.5');
insert into t1 values ('102', '2', '1.4');
insert into t1 values ('102', '1', '1.6');
insert into t1 values ('102', '2', '1.2');
insert into t1 values ('102', '1', '1.7');
insert into t1 values ('102', '2', '1.3');

I have tried the following but get an error.

SELECT 
    item,
    p,
    value,
NTILE(4) OVER (ORDER BY value DESC) AS quartile
FROM t1
group by item
where p=1

Error message:

Error while compiling statement: FAILED ParseException line 8:0 missing EOF at 'where' near item

I can do it in R, with a command like:

d[p==1, quantile(value, .75, na.rm=TRUE), by=item]

but I need this in Hadoop for performance reasons.

644

asked Oct 01 '15 09:10

Timothée HENRY

2 Answers

In Hive using Percentile function we can find the quantile values.

Below query is used to find the 25th,50th,75th percentile values for each item.

 select item,p,percentile_approx(value,array(0.25,0.50.0.75)) 
 from t1 where p=1 group by item,p;

Below query is used to find the given percentile values for each item.

select item,p,percentile_approx(value,0.5) 
from t1 where p=1 group by item,p;

115

answered Oct 06 '22 04:10

anand

"Where" should come before "group by"

answered Oct 06 '22 03:10

user3834191

Related questions
                            
                                How to resolve java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2?
                            
                                Hive - get column names
                            
                                Hive (Finding min of n columns in a row)
                            
                                How recursively use a directory structure in the new Hadoop API?
                            
                                Spark Shell stuck in YARN Accepted state
                            
                                List folder and files of HDFS using JAVA
                            
                                In Nifi, what is the difference between FirstInFirstOutPrioritizer and OldestFlowFileFirstPrioritizer
                            
                                spark select and add columns with alias
                            
                                Splitting input into substrings in PIG (Hadoop)
                            
                                Video Tutorial for Hadoop [closed]
                            
                                what is best HBase client API for java [closed]
                            
                                Cassandra and MapReduce - minimal setup requirements
                            
                                HBase HDFS zookeeper
                            
                                HIVE nested ARRAY in MAP data type
                            
                                Sqoop import Null string
                            
                                Cloudera Hadoop Class file for org.apache.hadoop.classification.InterfaceAudience not found
                            
                                Concat single column fields using GROUP BY
                            
                                Differences between MapReduce and Yarn
                            
                                Assessing from the end of a split array in Hive
                            
                                find file in hadoop filesystem

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With