Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the joint cumulative distribution function from a 2-D copula in R?

I am now working on copula in R and I wonder how to find the joint cumulative distribution in R?

D = c(1,3,2,2,8,2,1,3,1,1,3,3,1,1,2,1,2,1,1,3,4,1,1,3,1,1,2,1,3,7,1,4,6,1,2,1,1,3,1,2,2,3,4,1,1,1,1,2,2,12,1,1,2,1,1,1,3,4)
S = c(1.42,5.15,2.52,2.29,12.36,2.82,1.49,3.53,1.17,1.03,4.03,5.26,1.65,1.41,3.75,1.09,3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)

After some exploration, I find that Gamma distribution is the best to describe the above data.

library(fitdistrplus)
fg_d <- fitdist(data = Dur, distr = "gamma", method = "mle")
fg_s <- fitdist(data = Sev, distr = "gamma", method = "mle")

Then, I try to select the copula family using the VineCopula packge:

mydata <- cbind(D=D, S=S)
u1 <- pobs(mydata[,1]) 
u2 <- pobs(mydata[,2])
fitCopula <- BiCopSelect(u1, u2, familyset=NA)
summary(fitCopula) 

The result indicats a "Survival Clayton". Then, I try to build the following copula:

library(copula)
cop_model <- surClaytonCopula(param = 5.79)

Now, according to the equation below (E(L) is assumed to be a constant): enter image description here

I need to find FD(d), FS(s), and C(FD(d),FS(s)) for given D and S values.

For example, if we take D=3 and S=2, then we have to find F(D<=3), F(S<=2), and C(D<=3 and S<=2). I wonder how to do this in R using the package copula?

Also, how can we find C(D<=3 or S<=2)? Thanks for any help.

like image 368
Yang Yang Avatar asked Mar 04 '19 07:03

Yang Yang


People also ask

How do you find the cumulative distribution function in R?

To calculate the cumulative distribution function in the R Language, we use the ecdf() function. The ecdf() function in R Language is used to compute and plot the value of the Empirical Cumulative Distribution Function of a numeric vector.

How do you find the cumulative distribution of a joint?

The joint cumulative distribution function of two random variables X and Y is defined as FXY(x,y)=P(X≤x,Y≤y). As usual, comma means "and," so we can write FXY(x,y)=P(X≤x,Y≤y)=P((X≤x) and (Y≤y))=P((X≤x)∩(Y≤y)). Figure 5.2 shows the region associated with FXY(x,y) in the two-dimensional plane.

How do you find the joint CDF from a joint PMF?

In the discrete case, we can obtain the joint cumulative distribution function (joint cdf) of X and Y by summing the joint pmf: F(x,y)=P(X≤x and Y≤y)=∑xi≤x∑yj≤yp(xi,yj), where xi denotes possible values of X and yj denotes possible values of Y.

How do you find the joint distribution function?

If continuous random variables X and Y are defined on the same sample space S, then their joint probability density function (joint pdf) is a piecewise continuous function, denoted f(x,y), that satisfies the following. F(a,b)=P(X≤a and Y≤b)=b∫−∞a∫−∞f(x,y)dxdy.


1 Answers

Here's an answer using only base R and the copula package:


  • FD(d) is a gamma CDF. According to your code it has shape 2.20 and rate 0.98 and so FD(3) is pgamma(3, 2.20, 0.98) = 0.7495596

  • FS(s) is a gamma CDF. According to your code it has shape 1.56 and rate 0.45 and so FS(2) is pgamma(2, 1.56, 0.45) = 0.3631978

  • C(FD(d), FS(s)) is the survival Clayton Copula (also known as the rotated Clayton copula) evaluated with the aforementioned marginals. In R this is

library(copula)
D_shape <- 2.20
D_rate  <- 0.98
S_shape <- 1.56
S_rate  <- 0.45
surv_clay <- rotCopula(claytonCopula(5.79))
pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
  • The denominator of Equation (23) on page 810 of the Shiau 2006 paper in the OP comments shows that P(D>=3 or S>=2) = 1- C(FD(d), FS(s)) which is:
1 - pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
  • P(D<=3 or S<=2) = P(D<=3) + P(S<=2) - P(D<=3,S<=2) so
 pgamma(3, D_shape, D_rate) + 
 pgamma(2, S_shape, S_rate) - 
 pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)

Sources

  • R-bloggers post on Modelling Dependence with Copulas
  • Realization that VineCopula::surClaytonCopula(5.79) corresponds to a copula::rotCopula(copula::claytonCopula(5.79)) from reading copula manual

Below is some code to double check some things via simulation.

library(fitdistrplus)
library(copula)
library(VineCopula)

D = c(1,3,2,2,8,2,1,3,1,1,3,3,1,1,2,1,2,1,1,3,4,1,1,3,1,1,2,1,3,7,1,4,6,1,2,1,1,3,1,2,2,3,4,1,1,1,1,2,2,12,1,1,2,1,1,1,3,4)
S = c(1.42,5.15,2.52,2.29,12.36,2.82,1.49,3.53,1.17,1.03,4.03,5.26,1.65,1.41,3.75,1.09,3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)

(fg_d <- fitdist(data = D, distr = "gamma", method = "mle"))
(fg_s <- fitdist(data = S, distr = "gamma", method = "mle"))

mydata <- cbind(D=D, S=S)
u1 <- pobs(mydata[,1]) 
u2 <- pobs(mydata[,2])
fitCopula <- BiCopSelect(u1, u2, familyset=NA)
summary(fitCopula) 

D_shape <- fg_d$estimate[1]
D_rate <-  fg_d$estimate[2]
S_shape <- fg_s$estimate[1]
S_rate <-  fg_s$estimate[2]

copula_dist <- mvdc(copula=rotCopula(claytonCopula(5.79)), margins=c("gamma","gamma"),
                    paramMargins=list(list(shape=D_shape, rate=D_rate),
                                      list(shape=S_shape, rate=S_rate)))

sim <- rMvdc(n = 1e5,
             copula_dist)

plot(sim, col="red")
points(D,S, col="black")
legend('bottomright',c('Observed','Simulated'),col=c('black','red'),pch=21)

And to answer the questions about specific calculations:

## F_D(d) for d=3
mean(sim[,1] <=3)          ## simulated
pgamma(3, D_shape, D_rate) ## theory

## F_S(s) for s=2
mean(sim[,2] <=2)          ## simulated
pgamma(2, S_shape, S_rate) ## theory

## C(F_D(d) for d=3 AND F_S(s) for s=2)
## simulated value:
mean(sim[,1] <=3 & sim[,2] <=2)
## with copula:
surv_clay <- rotCopula(claytonCopula(5.79))
pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)

## P(D>=3 or S>=2)
## simulated
mean(sim[,1] >= 3 | sim[,2] >=2)
## with copula:
1-pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)

## In case you want:
## P(D<=3 or S<=2) = P(D<=3) + P(S<=2) - P(D<=3,S<=2)
## simulated:
mean(sim[,1] <= 3 | sim[,2] <= 2)
## theory with copula:
pgamma(3, D_shape, D_rate) + pgamma(2, S_shape, S_rate) - pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)

Running the two chunks of code above gives the following output:

> (fg_d <- fitdist(data = D, distr = "gamma", method = "mle"))
Fitting of the distribution ' gamma ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape 2.2082572  0.3831383
rate  0.9775783  0.1903410
> (fg_s <- fitdist(data = S, distr = "gamma", method = "mle"))
Fitting of the distribution ' gamma ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape 1.5628338 0.26500235
rate  0.4494518 0.08964724
> 
> mydata <- cbind(D=D, S=S)
> u1 <- pobs(mydata[,1]) 
> u2 <- pobs(mydata[,2])
> fitCopula <- BiCopSelect(u1, u2, familyset=NA)
Warning message:
In cor(x[(x[, 1] < 0) & (x[, 2] < 0), ]) : the standard deviation is zero
> summary(fitCopula) 
Family
------ 
No:    13
Name:  Survival Clayton

Parameter(s)
------------
par:  5.79

Dependence measures
-------------------
Kendall's tau:    0.74 (empirical = 0.82, p value < 0.01)
Upper TD:         0.89 
Lower TD:         0 

Fit statistics
--------------
logLik:  57.68 
AIC:    -113.37 
BIC:    -111.31 

> 
> D_shape <- fg_d$estimate[1]
> D_rate <-  fg_d$estimate[2]
> S_shape <- fg_s$estimate[1]
> S_rate <-  fg_s$estimate[2]
> 
> copula_dist <- mvdc(copula=rotCopula(claytonCopula(5.79)), margins=c("gamma","gamma"),
+                     paramMargins=list(list(shape=D_shape, rate=D_rate),
+                                       list(shape=S_shape, rate=S_rate)))
> 
> sim <- rMvdc(n = 1e5,
+              copula_dist)
> 
> plot(sim, col="red")
> points(D,S, col="black")
> legend('bottomright',c('Observed','Simulated'),col=c('black','red'),pch=21)

enter image description here

And --

> ## F_D(d) for d=3
> mean(sim[,1] <=3)          ## simulated
[1] 0.74759
> pgamma(3, D_shape, D_rate) ## theory
[1] 0.746482
> 
> ## F_S(s) for s=2
> mean(sim[,2] <=2)          ## simulated
[1] 0.36233
> pgamma(2, S_shape, S_rate) ## theory
[1] 0.3617122
> 
> ## C(F_D(d) for d=3 AND F_S(s) for s=2)
> ## simulated value:
> mean(sim[,1] <=3 & sim[,2] <=2)
[1] 0.362
> ## with copula:
> surv_clay <- rotCopula(claytonCopula(5.79))
> pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
[1] 0.3615195
> 
> ## P(D>=3 or S>=2)
> ## simulated
> mean(sim[,1] >= 3 | sim[,2] >=2)
[1] 0.638
> ## with copula:
> 1-pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
[1] 0.6384805

> ## In case you want:
> ## P(D<=3 or S<=2) = P(D<=3) + P(S<=2) - P(D<=3,S<=2)
> ## simulated:
> mean(sim[,1] <= 3 | sim[,2] <= 2)
[1] 0.74792
> ## theory with copula:
> pgamma(3, D_shape, D_rate) + pgamma(2, S_shape, S_rate) - pCopula(c(pgamma(3, D_shape, D_rate),pgamma(2, S_shape, S_rate)), surv_clay)
[1] 0.7466747
like image 146
swihart Avatar answered Oct 24 '22 10:10

swihart