I have a pandas DataFrame
called data
with a column called ms
. I want to eliminate all the rows where data.ms
is above the 95% percentile. For now, I'm doing this:
limit = data.ms.describe(90)['95%'] valid_data = data[data['ms'] < limit]
which works, but I want to generalize that to any percentile. What's the best way to do that?
percentile: a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. quantile: values taken from regular intervals of the quantile function of a random variable.
Why use it? The reason the 95th percentile is so useful in measuring network usage is because it provides an accurate picture of how much it costs. By knowing the value of your network's 95th percentile, it's easy to identify spikes in usage.
Use the Series.quantile()
method:
In [48]: cols = list('abc') In [49]: df = DataFrame(randn(10, len(cols)), columns=cols) In [50]: df.a.quantile(0.95) Out[50]: 1.5776961953820687
To filter out rows of df
where df.a
is greater than or equal to the 95th percentile do:
In [72]: df[df.a < df.a.quantile(.95)] Out[72]: a b c 0 -1.044 -0.247 -1.149 2 0.395 0.591 0.764 3 -0.564 -2.059 0.232 4 -0.707 -0.736 -1.345 5 0.978 -0.099 0.521 6 -0.974 0.272 -0.649 7 1.228 0.619 -0.849 8 -0.170 0.458 -0.515 9 1.465 1.019 0.966
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With