Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate 95th percentile of values with grouping variable

I'm trying to calculate the 95th percentile for multiple water quality values grouped by watershed, for example:

Watershed   WQ
50500101    62.370661
50500101    65.505046
50500101    58.741477
50500105    71.220034
50500105    57.917249

I reviewed this question posted - Percentile for Each Observation w/r/t Grouping Variable. It seems very close to what I want to do but it's for EACH observation. I need it for each grouping variable. so ideally,

Watershed   WQ - 95th
50500101    x
50500105    y
like image 669
Christine Mazzarella Avatar asked Mar 29 '11 13:03

Christine Mazzarella


People also ask

How to get 95 percentile in Excel?

Below are some of the steps to achieve the 95 percentile of a given data set. The first step is to enter data into an empty excel sheet, open an excel workbook, and record names in one column and marks in the second column as in the case below. Your excel sheet should look like the one above.

What is the 95th percentile?

This is a standard measure used in interpreting performance data. This 95 th percentile is the highest value left when the top 5% of a numerically sorted set of collected data is discarded. It is used as a measure of the peak value used when one discounts a fair amount for transitory spikes. This makes it markedly different from the average.

How do you find the percentage of grouped data?

Percentiles for grouped data. Percentiles are the values which divide whole distriution into hundred equal parts. They are 99 in numbers namely $P_1, P_2, \cdots, P_{99}$. Here $P_1$ is first percentile, $P_2$ is second percentile and so on. Formula. $N$ is total number of observations.

What is the formula for I T H percentile?

Percentiles are the values which divide whole distriution into hundred equal parts. They are 99 in numbers namely P 1, P 2, ⋯, P 99. Here P 1 is first percentile, P 2 is second percentile and so on. For discrete frequency distribution, the formula for i t h percentile is N is total number of observations.


2 Answers

Use a combination of the tapply and quantile functions. For example, if your dataset looks like this:

DF <- data.frame('watershed'=sample(c('a','b','c','d'), 1000, replace=T), wq=rnorm(1000))

Use this:

with(DF, tapply(wq, watershed, quantile, probs=0.95))
like image 192
Vincent Avatar answered Sep 30 '22 14:09

Vincent


I hope I understand your question correctly. Is this what you're looking for?

my.df <- data.frame(group = gl(3, 5), var = runif(15))
aggregate(my.df$var, by = list(my.df$group), FUN = function(x) quantile(x, probs = 0.95))

  Group.1         x
1       1 0.6913747
2       2 0.8067847
3       3 0.9643744

EDIT

Based on Vincent's answer,

aggregate(my.df$var, by = list(my.df$group), FUN = quantile, probs  = 0.95)

also works (you can skin a cat 1001 ways - I've been told). A side note, you can specify a vector of desired -iles, say c(0.1, 0.2, 0.3...) for deciles. Or you can try function summary for some predefined statistics.

aggregate(my.df$var, by = list(my.df$group), FUN = summary)
like image 33
Roman Luštrik Avatar answered Sep 30 '22 14:09

Roman Luštrik