Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove data greater than 95th percentile in data frame

Tags:

I have a data like this:

df:

Group   Point
A       6000
B       5000
C       1000
D        100
F        70

Before I graph this df, I only like to remove values over than 95th percentile in my data frame. Any body tell me how to do that?

like image 863
user1471980 Avatar asked Sep 20 '12 19:09

user1471980


People also ask

How do you interpret the 95th percentile?

A percentile is the value at a particular rank. For example, if your score on a test is on the 95th percentile, a common interpretation is that only 5% of the scores were higher than yours.

How do you break data into percentiles?

Enter the following formula into the cell, excluding quotes: "=PERCENTILE. EXC(A1:AX,k)" where "X" is the last row in column "A" where you have entered data, and "k" is the percentile value you are looking for.

What is the 95th percentile value?

What's the 95th percentile? In networking, the 95th percentile is the highest value remaining after the top 5% of a data set is removed. For example, if you have 100 data points, you begin by removing the five largest values. The highest value left represents the 95th percentile.


2 Answers

Use the quantile function

> quantile(d$Point, 0.95)
 95% 
5800 

> d[d$Point < quantile(d$Point, 0.95), ]
  Group Point
2     B  5000
3     C  1000
4     D   100
5     F    70
like image 151
GSee Avatar answered Sep 28 '22 08:09

GSee


Or using 'dplyr' library:

> quantile(d$Point, 0.95)
 95% 
5800

> df %>% filter(Point < quantile(df$Point, 0.95))
  Group Point
1     B  5000
2     C  1000
3     D   100
4     F    70
like image 42
swojtasiak Avatar answered Sep 28 '22 08:09

swojtasiak