I am trying find the probability that a point lies within an ellipse? For eg if I was plotting the bivariate data (x,y) for 300 datasets in an 95% ellipsoid region, how do I calculate how many times out of 300 will my points fall inside the ellipse?
Heres the code I am using
library(MASS)
seed<-1234
x<-NULL
k<-1
Sigma2 <- matrix(c(.72,.57,.57,.46),2,2)
Sigma2
rho <- Sigma2[1,2]/sqrt(Sigma2[1,1]*Sigma2[2,2])
rho
eta1<-replicate(300,mvrnorm(k, mu=c(-1.59,-2.44), Sigma2))
library(car)
dataEllipse(eta1[1,],eta1[2,], levels=c(0.05, 0.95))
Thanks for your help.
I don't see why people are jumping on the OP. In context, it's clearly a programming question: it's about getting the empirical frequency of data points within a given ellipse, not a theoretical probability. The OP even posted code and a graph showing what they're trying to obtain.
It may be that they don't fully understand the statistical theory behind a 95% ellipse, but they didn't ask about that. Besides, making plots and calculating frequencies like this is an excellent way of coming to grips with the theory.
Anyway, here's some code that answers the narrowly-defined question of how to count the points within an ellipse obtained via a normal distribution (which is what underlies dataEllipse
). The idea is to transform your data to the unit circle via principal components, then get the points within a certain radius of the origin.
within.ellipse <- function(x, y, plot.ellipse=TRUE)
{
if(missing(y) && is.matrix(x) && ncol(x) == 2)
{
y <- x[,2]
x <- x[,1]
}
if(plot.ellipse)
dataEllipse(x, y, levels=0.95)
d <- scale(prcomp(cbind(x, y), scale.=TRUE)$x)
rad <- sqrt(2 * qf(.95, 2, nrow(d) - 1))
mean(sqrt(d[,1]^2 + d[,2]^2) < rad)
}
It was also commented that a 95% data ellipse contains 95% of the data by definition. This is certainly not true, at least for normal-theory ellipses. If your distribution is particularly bad, the coverage frequency may not even converge to the assumed level as the sample size increases. Consider a generalised pareto distribution, for example:
library(evd) # for rgpd
# generalised pareto has no variance for shape > 0.5
z <- sapply(1:1000, function(...) within.ellipse(rgpd(100, shape=5), rgpd(100, shape=5), FALSE))
mean(z)
[[1] 0.97451
z <- sapply(1:1000, function(...) within.ellipse(rgpd(10000, shape=5), rgpd(10000, shape=5), FALSE))
mean(z)
[1] 0.9995808
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With