Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advice on calculating a function to describe upper bound of data

I have a scatter plot of a dataset and I am interested in calculating the upper bound of the data. I don't know if this is a standard statistical approach so what I was considering doing was splitting the X-axis data into small ranges, calculating the max for these ranges and then trying to identify a function to describe these points. Is there a function already in R to do this?

If it's relevant there are 92611 points.

alt text

like image 364
djq Avatar asked Dec 12 '22 16:12

djq


1 Answers

You might like to look into quantile regression, which is available in the quantreg package. Whether this is useful will depend on whether you want the absolute maximum within your "windows" are whether some extreme quantile, say 95th or 99th, is acceptable? If you are not familiar with quantile regression, then consider the linear regression which fits a model for the expectation or mean response, conditional upon the model covariates. Quantile regression for the middle quantile (0.5) would fit a model to the median response, conditional upon the model covariates.

Here is an example using the quantreg package, to show you what I mean. First, generate some dummy data similar to the data you show:

set.seed(1)
N <- 5000
DF <- data.frame(Y = rev(sort(rlnorm(N, -0.9))) + rnorm(N),
                 X = seq_len(N))
plot(Y ~ X, data = DF)

Next, fit the model to the 99th percentile (or the 0.99 quantile):

mod <- rq(Y ~ log(X), data = DF, tau = .99)

To generate the "fitted line", we predict from the model at 100 equally spaced values in X

pDF <- data.frame(X = seq(1, 5000, length = 100))
pDF <- within(pDF, Y <- predict(mod, newdata = pDF))

and add the fitted model to the plot:

lines(Y ~ X, data = pDF, col = "red", lwd = 2)

This should give you this:

quantile regression output

like image 123
Gavin Simpson Avatar answered Jan 18 '23 01:01

Gavin Simpson