Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to draw an $\alpha$ confidence areas on a 2D-plot?

There are a lot of answers regarding to plotting confidence intervals.

I'm reading the paper by Lourme A. et al (2016) and I'd like to draw the 90% confidence boundary and the 10% exceptional points like in the Fig. 2 from the paper: enter image description here.

I can't use LaTeX and insert the picture with the definition of confidence areas: enter image description here

library("MASS")
library(copula)
set.seed(612)

n <- 1000 # length of sample
d <- 2    # dimension

# random vector with uniform margins on (0,1)
u1 <- runif(n, min = 0, max = 1)
u2 <- runif(n, min = 0, max = 1)

u = matrix(c(u1, u2), ncol=d)

Rg  <- cor(u)   # d-by-d correlation matrix
Rg1 <- ginv(Rg) # inv. matrix 

# round(Rg %*% Rg1, 8) # check

# the multivariate c.d.f of u is a Gaussian copula 
# with parameter Rg[1,2]=0.02876654

normal.cop = normalCopula(Rg[1,2], dim=d)
fit.cop    = fitCopula(normal.cop, u, method="itau") #fitting
# Rg.hat     = fit.cop@estimate[1]
# [1] 0.03097071
sim        = rCopula(n, normal.cop) # in (0,1)

# Taking the quantile function of N1(0, 1)

y1 <- qnorm(sim[,1], mean = 0, sd = 1)
y2 <- qnorm(sim[,2], mean = 0, sd = 1)

par(mfrow=c(2,2))

plot(y1, y2, col="red");  abline(v=mean(y1), h=mean(y2))
plot(sim[,1], sim[,2], col="blue")
hist(y1); hist(y2)

Reference. Lourme, A., F. Maurer (2016) Testing the Gaussian and Student's t copulas in a risk management framework. Economic Modelling.

Question. Could anyone help me and give the explanation of the variable v=(v_1,...,v_d) and G(v_1),..., G(v_d) in the equation?

I think v is the non-random matrix, the dimensions should be $k^2$ (grid points) by d=2 (dimensions). For example,

axis_x <- seq(0, 1, 0.1) # 11 grid points
axis_y <- seq(0, 1, 0.1) # 11 grid points
v <- expand.grid(axis_x, axis_y)
plot(v,  type = "p")
like image 661
Nick Avatar asked Feb 18 '17 04:02

Nick


Video Answer


2 Answers

So, your question is about the vector nu and correponding G(nu).

nu is a simple random vector drawn from any distribution that has a domain (0,1). (Here I use uniform distribution). Since you want your samples in 2D one single nu can be nu = runif(2). Given the explanations above, G is a gaussain pdf with mean 0 and a covariance matrix Rg. (Rg has dimensions of 2x2 in 2D).

Now what the paragraph says: if you have a random sample nu and you want it to be drawn from Gamma given the number of dimensions d and confidence level alpha then you need to compute the following statistic (G(nu) %*% Rg^-1) %*% G(nu) and check that is below the pdf of Chi^2 distribution for d and alpha.

For example:

# This is the copula parameter
Rg <- matrix(c(1,runif(2),1), ncol = 2)
# But we need to compute the inverse for sampling
Rginv <- MASS::ginv(Rg)

sampleResult <- replicate(10000, {
  # we draw our nu from uniform, but others that map to (0,1), e.g. beta, are possible, too
  nu <- runif(2)
  # we compute G(nu) which is a gaussian cdf on the sample
  Gnu <- qnorm(nu, mean = 0, sd = 1)
  # for this we compute the statistic as given in formula
  stat <- (Gnu %*% Rginv) %*% Gnu
  # and return the result
  list(nu = nu, Gnu = Gnu, stat = stat)
})

theSamples <- sapply(sampleResult["nu",], identity)

# this is the critical value of the Chi^2 with alpha = 0.95 and df = number of dimensions
# old and buggy threshold <- pchisq(0.95, df = 2)
# new and awesome - we are looking for the statistic at alpha = .95 quantile
threshold <- qchisq(0.95, df = 2)
# we can accept samples given the threshold (like in equation)
inArea <- sapply(sampleResult["stat",], identity) < threshold

plot(t(theSamples), col = as.integer(inArea)+1)

The red points are the points you would keep (I plot all points here).

enter image description here

As for drawing the decision boundries, I think it is a little bit more complicated, since you need to compute the exact pair of nu so that (Gnu %*% Rginv) %*% Gnu == pchisq(alpha, df = 2). It is a linear system that you solve for Gnu and then apply inverse to get your nu at the decision boundries.

edit: Reading the paragraph again, I noticed, the parameter for Gnu does not change, it is simply Gnu <- qnorm(nu, mean = 0, sd = 1).

edit: There was a bug: for threshold you need to use the quantile function qchisq instead of the distribution function pchisq - now corrected in the code above (and updated the figures).

like image 67
Drey Avatar answered Oct 26 '22 09:10

Drey


This has two parts: first, compute the copula value as a function of X and Y; then, plot the curve giving the boundary where the copula exceeds the threshold.

Computing the value is basically linear algebra which @drey has answered. This is a rewritten version so that the copula is given by a function.

cop1 <- function(x)
{
    Gnu <- qnorm(x)
    Gnu %*% Rginv %*% Gnu
}

copula <- function(x)
{
    apply(x, 1, cop1)
}

Plotting the boundary curve can be done using the same method as here (which in turn is the method used by the textbooks Modern Applied Stats with S, and Elements of Stat Learning). Create a grid of values, and use interpolation to find the contour line at the given height.

Rg <- matrix(c(1,runif(2),1), ncol = 2)
Rginv <- MASS::ginv(Rg)

# draw the contour line where value == threshold
# define a grid of values first: avoid x and y = 0 and 1, where infinities exist
xlim <- 1e-3
delta <- 1e-3
xseq <- seq(xlim, 1-xlim, by=delta)
grid <- expand.grid(x=xseq, y=xseq)
prob.grid <- copula(grid)
threshold <- qchisq(0.95, df=2)

contour(x=xseq, y=xseq, z=matrix(prob.grid, nrow=length(xseq)), levels=threshold,
        col="grey", drawlabels=FALSE, lwd=2)

# add some points
data <- data.frame(x=runif(1000), y=runif(1000))
points(data, col=ifelse(copula(data) < threshold, "red", "black"))

enter image description here

like image 36
Hong Ooi Avatar answered Oct 26 '22 09:10

Hong Ooi