I have a scatter plot of a dataset and I am interested in calculating the upper bound of the data. I don't know if this is a standard statistical approach so what I was considering doing was splitting the X-axis data into small ranges, calculating the max for these ranges and then trying to identify a function to describe these points. Is there a function already in R to do this?
If it's relevant there are 92611 points.
You might like to look into quantile regression, which is available in the quantreg package. Whether this is useful will depend on whether you want the absolute maximum within your "windows" are whether some extreme quantile, say 95th or 99th, is acceptable? If you are not familiar with quantile regression, then consider the linear regression which fits a model for the expectation or mean response, conditional upon the model covariates. Quantile regression for the middle quantile (0.5) would fit a model to the median response, conditional upon the model covariates.
Here is an example using the quantreg package, to show you what I mean. First, generate some dummy data similar to the data you show:
set.seed(1)
N <- 5000
DF <- data.frame(Y = rev(sort(rlnorm(N, -0.9))) + rnorm(N),
X = seq_len(N))
plot(Y ~ X, data = DF)
Next, fit the model to the 99th percentile (or the 0.99 quantile):
mod <- rq(Y ~ log(X), data = DF, tau = .99)
To generate the "fitted line", we predict from the model at 100 equally spaced values in X
pDF <- data.frame(X = seq(1, 5000, length = 100))
pDF <- within(pDF, Y <- predict(mod, newdata = pDF))
and add the fitted model to the plot:
lines(Y ~ X, data = pDF, col = "red", lwd = 2)
This should give you this:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With