I have a data like this:
df:
Group Point
A 6000
B 5000
C 1000
D 100
F 70
Before I graph this df, I only like to remove values over than 95th percentile in my data frame. Any body tell me how to do that?
A percentile is the value at a particular rank. For example, if your score on a test is on the 95th percentile, a common interpretation is that only 5% of the scores were higher than yours.
Enter the following formula into the cell, excluding quotes: "=PERCENTILE. EXC(A1:AX,k)" where "X" is the last row in column "A" where you have entered data, and "k" is the percentile value you are looking for.
What's the 95th percentile? In networking, the 95th percentile is the highest value remaining after the top 5% of a data set is removed. For example, if you have 100 data points, you begin by removing the five largest values. The highest value left represents the 95th percentile.
Use the quantile
function
> quantile(d$Point, 0.95)
95%
5800
> d[d$Point < quantile(d$Point, 0.95), ]
Group Point
2 B 5000
3 C 1000
4 D 100
5 F 70
Or using 'dplyr' library:
> quantile(d$Point, 0.95)
95%
5800
> df %>% filter(Point < quantile(df$Point, 0.95))
Group Point
1 B 5000
2 C 1000
3 D 100
4 F 70
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With