Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where to define distribution function to be used with fitdist (fitdistrplus) or fitdistr (MASS)?

I want to define my own distribution functions to be used with fitdist or fitdistr function in R. Using fitdist in the fitdistrplus package as an example. I define a customized distribution called sgamma as follows:

dsgamma<-function(x,shape){return(dgamma(x,shape,scale=1));}
qsgamma<-function(p,shape){return(qgamma(p,shape,scale=1));}
psgamma<-function(q,shape){return(pgamma(q,shape,scale=1));}
rsgamma<-function(n,shape){return(rgamma(n,shape,scale=1));}

My question is where I should define these functions.

If the above definitnion and declaration is made in the top environment, then I could call fitdist using this distribution function. In other words, my script test1.R with the following content will run just fine:

rm(list=ls())
require(fitdistrplus);
dsgamma<-function(x,shape){return(dgamma(x,shape,scale=1));}
qsgamma<-function(p,shape){return(qgamma(p,shape,scale=1));}
psgamma<-function(q,shape){return(pgamma(q,shape,scale=1));}
rsgamma<-function(n,shape){return(rgamma(n,shape,scale=1));}
x<-rgamma(100, shape=0.4, scale=1);
zfit<-fitdist(x, distr=dsgamma, start=list(shape=0.3));

Now, if I wrapped the above code in a function, it does not work. See test2.R below:

rm(list=ls())
testfit<-function(x)
{
    require(fitdistrplus);
    dsgamma<-function(x,shape){return(dgamma(x,shape,scale=1));}
    qsgamma<-function(p,shape){return(qgamma(p,shape,scale=1));}
    psgamma<-function(q,shape){return(pgamma(q,shape,scale=1));}
    rsgamma<-function(n,shape){return(rgamma(n,shape,scale=1));}
    zfit<-fitdist(x, distr=dsgamma, start=list(shape=0.3));
    return(zfit);
}

x<-rgamma(100, shape=0.4, scale=1);
zfit<-testfit(x);

I got the following error:

Error in fitdist(x, distr = dsgamma, start = list(shape = 0.3)) : 
  The  dsgamma  function must be defined

Note that I still get an error if I replace

zfit<-fitdist(x, distr=dsgamma, start=list(shape=0.3));

with

zfit<-fitdist(x, distr="sgamma", start=list(shape=0.3));

I guess the key question is where fitdist look for the function specified by the parameter distr. I would really appreciate your help.

like image 565
huang Avatar asked Sep 30 '22 06:09

huang


1 Answers

Great question. The reason for this error is that the authors of the fitdistrplus package use exists() to check for variations of arguments needed by the function.

The following is an excerpt from the code of the fitdist and mledist functions. Where the authors take the value given for distr and search for appropriate density and probability functions in the global environment and the environment where fitdist and mledist are defined.

if (!exists(ddistname,mode="function"))
    stop(paste("The ", ddistname, " function must be defined"))
pdistname <- paste("p", distname, sep = "")
if (!exists(pdistname,mode="function"))
    stop(paste("The ", pdistname, " function must be defined"))

This is an excerpt from how exists works:

This function looks to see if the name ‘x’ has a value bound to it in the specified environment. If ‘inherits’ is ‘TRUE’ and a value is not found for ‘x’ in the specified environment, the enclosing frames of the environment are searched until the name ‘x’ is encountered. See ‘environment’ and the ‘R Language Definition’ manual for details about the structure of environments and their enclosures.

To learn more about why exists is making your function fail check this article: http://adv-r.had.co.nz/Environments.html

Essentially, fitdist and mledist are not searching in the environment of the function you are creating giving you the error that the dsgamma (and the other functions you define) do not exist.

This can be most easily circumvented by using <<- instead of <- to define the functions within your testfit(). This will define your child functions globally.

 > testfit<-function(x)
 +     {
 +             require(fitdistrplus);
 +                 dsgamma<<-function(x,shape){return(dgamma(x,shape,scale=1))}
 +                 qsgamma<<-function(p,shape){return(qgamma(p,shape,scale=1))}
 +                 psgamma<<-function(q,shape){return(pgamma(q,shape,scale=1))}
 +                 rsgamma<<-function(n,shape){return(rgamma(n,shape,scale=1))}
 +                 zfit<-function(x){return(fitdist(x,distr="sgamma" , start=list(shape=0.3)))};
 +                 return(zfit(x))
 +         }
!> testfit(x)
 Fitting of the distribution ' sgamma ' by maximum likelihood
 Parameters:
       estimate Std. Error
 shape 0.408349 0.03775797

You can alter the code of fitdist to search in your function's environment by adding envir=parent.frame() to the exists checks like follows, but I do not recommend this.

if (!exists(ddistname,mode="function",envir=parent.frame()))

However, this still doesn't solve your problem as fitdist calls mledist and mledist has the same problem.

 Error in mledist(data, distname, start, fix.arg, ...) (from #43) :
   The  dsgamma  function must be defined

To pursue this approach you will have to alter mledist as well and tell it to search in the parent.frame of fitdistr. You will have to make these changes each time you load the library.

like image 123
bjoseph Avatar answered Oct 07 '22 21:10

bjoseph