Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the percentile function work in Hive?

Imagine the following column calledd id:

68 69 43 54 56 61 69 70 71 72 77 78 79 85 87 88 89 93 95 96 98 99 99 62 66

If I do the following: percentile(id, 0.9), the output is 97.2. What is going on?

like image 904
Pratik Garg Avatar asked Jan 19 '17 06:01

Pratik Garg


People also ask

How do you use the percentile function?

Enter the following formula into the cell, excluding quotes: "=PERCENTILE. EXC(A1:AX,k)" where "X" is the last row in column "A" where you have entered data, and "k" is the percentile value you are looking for.

How do you find the percentile in hive?

Use PERCENTILE_APPROX if your input is non-integral. Returns an approximate pth percentile of a numeric column (including floating point types) in the group. The B parameter controls approximation accuracy at the cost of memory. Higher values yield better approximations, and the default is 10,000.

How do you interpret the percentile?

A percentile is the value at a particular rank. For example, if your score on a test is on the 95th percentile, a common interpretation is that only 5% of the scores were higher than yours. The median is the 50th percentile; it is commonly assumed that 50% the values in a data set are above the median.


1 Answers

If you put 0.9, you expect that the 90% of the data you give to the function will be under the returned value. 90% of 25 is approximately 22.5, and 97.2 can be a correct answer, because the four highest values are 99 99 98 96 in your set, and 97.2 is between the 22nd (96) and the 23rd (98) ordered numbers.

like image 121
Andrea Avatar answered Nov 29 '22 16:11

Andrea