Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the key components and functions for standard model objects in R?

Tags:

I have implemented a new statistical model in R and it works in my sandbox, but I would like to make it more standard. A good comparison is lm(), where I can take a model object and:

  • apply the summary() function
  • extract the coefficients of the model
  • extract residuals from the fitted (training) data
  • update the model
  • apply the predict() function
  • apply plot() to pre-selected descriptive plots
  • engage in many other kinds of joy

I've looked through the R manuals, searched online, and thumbed through several books, and, unless I'm overlooking something, I can't find a good tutorial on what should go into a new model package.

Although I'm most interested in thorough references or guides, I'll keep this post focused on a question with two components:

  1. What are the key components that are usually expected to be in a model object?
  2. What are typical functions that are usually implemented in a modeling package?

Answers could be from the R Core (or package developers) perspective or from the perspective of users, e.g. users expect to be able to use functions like summary, predict, residuals, coefficients, and often expect to pass a formula when fitting a model.

like image 263
Iterator Avatar asked Jul 27 '11 18:07

Iterator


People also ask

What is a model object in R?

The model object itself is just a list, with an attribute identifying its class ( lm , in the example I'm citing). You'll need to return this list from the function that runs your algorithm, via code like result <- list(obj=model, data=data, parameters=params) class(result) <- 'whatever' return(result)

What are the objects of R?

There are 6 types of objects in R Programming. They include vector, list, matrix, array, factor, and data frame. Vectors are one of the basic R programming data objects. They are six types of atomic vectors- logical, integer, character, raw, double, and complex.

What are R's five most common types of objects?

R's basic data types are character, numeric, integer, complex, and logical.

What is data Modelling in R?

The datamodelr R package provides tools to document relational data. The generate data models are leveraged by the dm R package to interact more easily with relational data.


1 Answers

Put into the object what you think is useful and necessary. I think a more important Question is how do you include this information, as well as how one accesses it.

At a minimum, provide a print() method so the entire object doesn't get dumped to the screen when you print the object. If you provide a summary() method, the convention is to have that object return an object of class summary.foo (where foo is your class) and then provide a print.summary.foo() method --- you don't want your summary() method doing any printing in and of itself.

If you have coefficients, fitted values and residuals and these are simple, then you can store them in your returned object as $coefficients, $fitted.values and $residuals respectively. Then the default methods for coef(), fitted() and resid() will work without you needing to add your own bespoke methods. If these are not simple, then provide your own methods for coef(), fitted.values() and residuals() for your class. By not simple, I mean, for example, if there are several types of residual and you need to process the stored residuals to get the requested type --- then you need your own method that takes a type argument or similar to select from the available types of residual. See ?residuals.glm for an example.

If predictions are something that can be usefully provided, then a predict() method could be provided. Look at the predict.lm() method for example to see what arguments should be taken. Likewise, an update() can be provided if it makes sense to update the model by adding/removing terms or altering model parameters.

plot.lm() gives an example of a method that provides several diagnostics plots of the fitted model. You could model your method on that function to select from a set of predefined diagnostics plots.

If your model has a likelihood, then providing a logLik() method to compute or extract it from the fitted model object would be standard, deviance() is another similar function if such a thing is pertinent. For confidence intervals on parameters, confint() is the standard method.

If you have a formula interface, then formula() methods can extract it. If you store it in a place that the default method searches for, then your life will be made easier. A simple way to store this is to store the matched call (match.call()) in the $call component. Methods to extract the model frame (model.frame()) and model matrix (model.matrix()) that are the data and the expanded (factors converted to variables using contrasts, plus any transformations or functions of the model frame data) model matrix are standard extractor functions. Look at examples from standard R modelling functions for ideas on how to store/extract this information.

If you do use a formula interface, try to follow the standard, non-standard evaluation method used in most R model objects that have a formula interface/method. You can find details of that on the R Developer page, in particular the document by Thomas Lumley. This gives plenty of advice on making your function work like one expects an R modelling function to work.

If you follow this paradigm, then extractors like na.action() should just work if you follow the standard (non-standard) rules.

like image 68
Gavin Simpson Avatar answered Oct 22 '22 11:10

Gavin Simpson