There are a lot of answers regarding to plotting confidence intervals.
I'm reading the paper by Lourme A. et al (2016) and I'd like to draw the 90% confidence boundary and the 10% exceptional points like in the Fig. 2 from the paper: .
I can't use LaTeX and insert the picture with the definition of confidence areas:
library("MASS")
library(copula)
set.seed(612)
n <- 1000 # length of sample
d <- 2 # dimension
# random vector with uniform margins on (0,1)
u1 <- runif(n, min = 0, max = 1)
u2 <- runif(n, min = 0, max = 1)
u = matrix(c(u1, u2), ncol=d)
Rg <- cor(u) # d-by-d correlation matrix
Rg1 <- ginv(Rg) # inv. matrix
# round(Rg %*% Rg1, 8) # check
# the multivariate c.d.f of u is a Gaussian copula
# with parameter Rg[1,2]=0.02876654
normal.cop = normalCopula(Rg[1,2], dim=d)
fit.cop = fitCopula(normal.cop, u, method="itau") #fitting
# Rg.hat = fit.cop@estimate[1]
# [1] 0.03097071
sim = rCopula(n, normal.cop) # in (0,1)
# Taking the quantile function of N1(0, 1)
y1 <- qnorm(sim[,1], mean = 0, sd = 1)
y2 <- qnorm(sim[,2], mean = 0, sd = 1)
par(mfrow=c(2,2))
plot(y1, y2, col="red"); abline(v=mean(y1), h=mean(y2))
plot(sim[,1], sim[,2], col="blue")
hist(y1); hist(y2)
Reference. Lourme, A., F. Maurer (2016) Testing the Gaussian and Student's t copulas in a risk management framework. Economic Modelling.
Question. Could anyone help me and give the explanation of the variable v=(v_1,...,v_d)
and G(v_1),..., G(v_d)
in the equation?
I think v
is the non-random matrix, the dimensions should be $k^2$ (grid points) by d=2
(dimensions). For example,
axis_x <- seq(0, 1, 0.1) # 11 grid points
axis_y <- seq(0, 1, 0.1) # 11 grid points
v <- expand.grid(axis_x, axis_y)
plot(v, type = "p")
So, your question is about the vector nu
and correponding G(nu)
.
nu
is a simple random vector drawn from any distribution that has a domain (0,1). (Here I use uniform distribution). Since you want your samples in 2D one single nu
can be nu = runif(2)
. Given the explanations above, G
is a gaussain pdf with mean 0 and a covariance matrix Rg
. (Rg has dimensions of 2x2 in 2D).
Now what the paragraph says: if you have a random sample nu
and you want it to be drawn from Gamma
given the number of dimensions d
and confidence level alpha
then you need to compute the following statistic (G(nu) %*% Rg^-1) %*% G(nu)
and check that is below the pdf of Chi^2 distribution for d
and alpha
.
For example:
# This is the copula parameter
Rg <- matrix(c(1,runif(2),1), ncol = 2)
# But we need to compute the inverse for sampling
Rginv <- MASS::ginv(Rg)
sampleResult <- replicate(10000, {
# we draw our nu from uniform, but others that map to (0,1), e.g. beta, are possible, too
nu <- runif(2)
# we compute G(nu) which is a gaussian cdf on the sample
Gnu <- qnorm(nu, mean = 0, sd = 1)
# for this we compute the statistic as given in formula
stat <- (Gnu %*% Rginv) %*% Gnu
# and return the result
list(nu = nu, Gnu = Gnu, stat = stat)
})
theSamples <- sapply(sampleResult["nu",], identity)
# this is the critical value of the Chi^2 with alpha = 0.95 and df = number of dimensions
# old and buggy threshold <- pchisq(0.95, df = 2)
# new and awesome - we are looking for the statistic at alpha = .95 quantile
threshold <- qchisq(0.95, df = 2)
# we can accept samples given the threshold (like in equation)
inArea <- sapply(sampleResult["stat",], identity) < threshold
plot(t(theSamples), col = as.integer(inArea)+1)
The red points are the points you would keep (I plot all points here).
As for drawing the decision boundries, I think it is a little bit more complicated, since you need to compute the exact pair of nu
so that (Gnu %*% Rginv) %*% Gnu == pchisq(alpha, df = 2)
. It is a linear system that you solve for Gnu
and then apply inverse to get your nu
at the decision boundries.
edit: Reading the paragraph again, I noticed, the parameter for Gnu does not change, it is simply Gnu <- qnorm(nu, mean = 0, sd = 1)
.
edit: There was a bug: for threshold you need to use the quantile function qchisq
instead of the distribution function pchisq
- now corrected in the code above (and updated the figures).
This has two parts: first, compute the copula value as a function of X and Y; then, plot the curve giving the boundary where the copula exceeds the threshold.
Computing the value is basically linear algebra which @drey has answered. This is a rewritten version so that the copula is given by a function.
cop1 <- function(x)
{
Gnu <- qnorm(x)
Gnu %*% Rginv %*% Gnu
}
copula <- function(x)
{
apply(x, 1, cop1)
}
Plotting the boundary curve can be done using the same method as here (which in turn is the method used by the textbooks Modern Applied Stats with S, and Elements of Stat Learning). Create a grid of values, and use interpolation to find the contour line at the given height.
Rg <- matrix(c(1,runif(2),1), ncol = 2)
Rginv <- MASS::ginv(Rg)
# draw the contour line where value == threshold
# define a grid of values first: avoid x and y = 0 and 1, where infinities exist
xlim <- 1e-3
delta <- 1e-3
xseq <- seq(xlim, 1-xlim, by=delta)
grid <- expand.grid(x=xseq, y=xseq)
prob.grid <- copula(grid)
threshold <- qchisq(0.95, df=2)
contour(x=xseq, y=xseq, z=matrix(prob.grid, nrow=length(xseq)), levels=threshold,
col="grey", drawlabels=FALSE, lwd=2)
# add some points
data <- data.frame(x=runif(1000), y=runif(1000))
points(data, col=ifelse(copula(data) < threshold, "red", "black"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With