I'm trying to calculate the 95th percentile for multiple water quality values grouped by watershed, for example:
Watershed WQ
50500101 62.370661
50500101 65.505046
50500101 58.741477
50500105 71.220034
50500105 57.917249
I reviewed this question posted - Percentile for Each Observation w/r/t Grouping Variable. It seems very close to what I want to do but it's for EACH observation. I need it for each grouping variable. so ideally,
Watershed WQ - 95th
50500101 x
50500105 y
Below are some of the steps to achieve the 95 percentile of a given data set. The first step is to enter data into an empty excel sheet, open an excel workbook, and record names in one column and marks in the second column as in the case below. Your excel sheet should look like the one above.
This is a standard measure used in interpreting performance data. This 95 th percentile is the highest value left when the top 5% of a numerically sorted set of collected data is discarded. It is used as a measure of the peak value used when one discounts a fair amount for transitory spikes. This makes it markedly different from the average.
Percentiles for grouped data. Percentiles are the values which divide whole distriution into hundred equal parts. They are 99 in numbers namely $P_1, P_2, \cdots, P_{99}$. Here $P_1$ is first percentile, $P_2$ is second percentile and so on. Formula. $N$ is total number of observations.
Percentiles are the values which divide whole distriution into hundred equal parts. They are 99 in numbers namely P 1, P 2, ⋯, P 99. Here P 1 is first percentile, P 2 is second percentile and so on. For discrete frequency distribution, the formula for i t h percentile is N is total number of observations.
Use a combination of the tapply and quantile functions. For example, if your dataset looks like this:
DF <- data.frame('watershed'=sample(c('a','b','c','d'), 1000, replace=T), wq=rnorm(1000))
Use this:
with(DF, tapply(wq, watershed, quantile, probs=0.95))
I hope I understand your question correctly. Is this what you're looking for?
my.df <- data.frame(group = gl(3, 5), var = runif(15))
aggregate(my.df$var, by = list(my.df$group), FUN = function(x) quantile(x, probs = 0.95))
Group.1 x
1 1 0.6913747
2 2 0.8067847
3 3 0.9643744
EDIT
Based on Vincent's answer,
aggregate(my.df$var, by = list(my.df$group), FUN = quantile, probs = 0.95)
also works (you can skin a cat 1001 ways - I've been told). A side note, you can specify a vector of desired -iles, say c(0.1, 0.2, 0.3...)
for deciles. Or you can try function summary
for some predefined statistics.
aggregate(my.df$var, by = list(my.df$group), FUN = summary)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With