Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in contrasts when defining a linear model in R

When I try to define my linear model in R as follows:

lm1 <- lm(predictorvariable ~ x1+x2+x3, data=dataframe.df) 

I get the following error message:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :  contrasts can be applied only to factors with 2 or more levels  

Is there any way to ignore this or fix it? Some of the variables are factors and some are not.

like image 205
REnthusiast Avatar asked Aug 11 '13 11:08

REnthusiast


People also ask

What does error in contrasts mean in R?

This error occurs when you attempt to fit a regression model using a predictor variable that is either a factor or character and only has one unique value.

What are contrasts in linear models?

In statistics, particularly in analysis of variance and linear regression, a contrast is a linear combination of variables (parameters or statistics) whose coefficients add up to zero, allowing comparison of different treatments.

What is the contrast function in R?

Details. The contrasts() function returns the contrasts matrix of x, which is computed using the contrasts attribute assigned by contrasts(x) <- contr if it exists. Otherwise, the contrasts matrix is computed based on getOption("contrasts"): ordinary factors will use the first entry and ordered factors the second.

What is the default contrast in R?

The Default Setting R provides five built-in contrast functions, and you can write your own. The default for unordered variables is contr. treatment(), which is much the most frequently used. Other choices include contr.


1 Answers

If your independent variable (RHS variable) is a factor or a character taking only one value then that type of error occurs.

Example: iris data in R

(model1 <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris))  # Call: # lm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)  # Coefficients: #       (Intercept)        Sepal.Width  Speciesversicolor   Speciesvirginica   #            2.2514             0.8036             1.4587             1.9468   

Now, if your data consists of only one species:

(model1 <- lm(Sepal.Length ~ Sepal.Width + Species,               data=iris[iris$Species == "setosa", ])) # Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :  #   contrasts can be applied only to factors with 2 or more levels 

If the variable is numeric (Sepal.Width) but taking only a single value say 3, then the model runs but you will get NA as coefficient of that variable as follows:

(model2 <-lm(Sepal.Length ~ Sepal.Width + Species,              data=iris[iris$Sepal.Width == 3, ]))  # Call: # lm(formula = Sepal.Length ~ Sepal.Width + Species,  #    data = iris[iris$Sepal.Width == 3, ])  # Coefficients: #       (Intercept)        Sepal.Width  Speciesversicolor   Speciesvirginica   #             4.700                 NA              1.250              2.017 

Solution: There is not enough variation in dependent variable with only one value. So, you need to drop that variable, irrespective of whether that is numeric or character or factor variable.

Updated as per comments: Since you know that the error will only occur with factor/character, you can focus only on those and see whether the length of levels of those factor variables is 1 (DROP) or greater than 1 (NODROP).

To see, whether the variable is a factor or not, use the following code:

(l <- sapply(iris, function(x) is.factor(x))) # Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species  #        FALSE        FALSE        FALSE        FALSE         TRUE  

Then you can get the data frame of factor variables only

m <- iris[, l] 

Now, find the number of levels of factor variables, if this is one you need to drop that

ifelse(n <- sapply(m, function(x) length(levels(x))) == 1, "DROP", "NODROP") 

Note: If the levels of factor variable is only one then that is the variable, you have to drop.

like image 189
Metrics Avatar answered Sep 20 '22 23:09

Metrics