I have implemented a new statistical model in R and it works in my sandbox, but I would like to make it more standard. A good comparison is lm()
, where I can take a model object and:
summary()
functionpredict()
functionplot()
to pre-selected descriptive plotsI've looked through the R manuals, searched online, and thumbed through several books, and, unless I'm overlooking something, I can't find a good tutorial on what should go into a new model package.
Although I'm most interested in thorough references or guides, I'll keep this post focused on a question with two components:
Answers could be from the R Core (or package developers) perspective or from the perspective of users, e.g. users expect to be able to use functions like summary, predict, residuals, coefficients, and often expect to pass a formula when fitting a model.
The model object itself is just a list, with an attribute identifying its class ( lm , in the example I'm citing). You'll need to return this list from the function that runs your algorithm, via code like result <- list(obj=model, data=data, parameters=params) class(result) <- 'whatever' return(result)
There are 6 types of objects in R Programming. They include vector, list, matrix, array, factor, and data frame. Vectors are one of the basic R programming data objects. They are six types of atomic vectors- logical, integer, character, raw, double, and complex.
R's basic data types are character, numeric, integer, complex, and logical.
The datamodelr R package provides tools to document relational data. The generate data models are leveraged by the dm R package to interact more easily with relational data.
Put into the object what you think is useful and necessary. I think a more important Question is how do you include this information, as well as how one accesses it.
At a minimum, provide a print()
method so the entire object doesn't get dumped to the screen when you print the object. If you provide a summary()
method, the convention is to have that object return an object of class summary.foo
(where foo
is your class) and then provide a print.summary.foo()
method --- you don't want your summary()
method doing any printing in and of itself.
If you have coefficients, fitted values and residuals and these are simple, then you can store them in your returned object as $coefficients
, $fitted.values
and $residuals
respectively. Then the default methods for coef()
, fitted()
and resid()
will work without you needing to add your own bespoke methods. If these are not simple, then provide your own methods for coef()
, fitted.values()
and residuals()
for your class. By not simple, I mean, for example, if there are several types of residual and you need to process the stored residuals to get the requested type --- then you need your own method that takes a type
argument or similar to select from the available types of residual. See ?residuals.glm
for an example.
If predictions are something that can be usefully provided, then a predict()
method could be provided. Look at the predict.lm()
method for example to see what arguments should be taken. Likewise, an update()
can be provided if it makes sense to update the model by adding/removing terms or altering model parameters.
plot.lm()
gives an example of a method that provides several diagnostics plots of the fitted model. You could model your method on that function to select from a set of predefined diagnostics plots.
If your model has a likelihood, then providing a logLik()
method to compute or extract it from the fitted model object would be standard, deviance()
is another similar function if such a thing is pertinent. For confidence intervals on parameters, confint()
is the standard method.
If you have a formula interface, then formula()
methods can extract it. If you store it in a place that the default method searches for, then your life will be made easier. A simple way to store this is to store the matched call (match.call()
) in the $call
component. Methods to extract the model frame (model.frame()
) and model matrix (model.matrix()
) that are the data and the expanded (factors converted to variables using contrasts, plus any transformations or functions of the model frame data) model matrix are standard extractor functions. Look at examples from standard R modelling functions for ideas on how to store/extract this information.
If you do use a formula interface, try to follow the standard, non-standard evaluation method used in most R model objects that have a formula interface/method. You can find details of that on the R Developer page, in particular the document by Thomas Lumley. This gives plenty of advice on making your function work like one expects an R modelling function to work.
If you follow this paradigm, then extractors like na.action()
should just work if you follow the standard (non-standard) rules.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With