When I try to define my linear model in R as follows:
lm1 <- lm(predictorvariable ~ x1+x2+x3, data=dataframe.df)
I get the following error message:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
Is there any way to ignore this or fix it? Some of the variables are factors and some are not.
This error occurs when you attempt to fit a regression model using a predictor variable that is either a factor or character and only has one unique value.
In statistics, particularly in analysis of variance and linear regression, a contrast is a linear combination of variables (parameters or statistics) whose coefficients add up to zero, allowing comparison of different treatments.
Details. The contrasts() function returns the contrasts matrix of x, which is computed using the contrasts attribute assigned by contrasts(x) <- contr if it exists. Otherwise, the contrasts matrix is computed based on getOption("contrasts"): ordinary factors will use the first entry and ordered factors the second.
The Default Setting R provides five built-in contrast functions, and you can write your own. The default for unordered variables is contr. treatment(), which is much the most frequently used. Other choices include contr.
If your independent variable (RHS variable) is a factor or a character taking only one value then that type of error occurs.
Example: iris data in R
(model1 <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris)) # Call: # lm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris) # Coefficients: # (Intercept) Sepal.Width Speciesversicolor Speciesvirginica # 2.2514 0.8036 1.4587 1.9468
Now, if your data consists of only one species:
(model1 <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris[iris$Species == "setosa", ])) # Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : # contrasts can be applied only to factors with 2 or more levels
If the variable is numeric (Sepal.Width
) but taking only a single value say 3, then the model runs but you will get NA
as coefficient of that variable as follows:
(model2 <-lm(Sepal.Length ~ Sepal.Width + Species, data=iris[iris$Sepal.Width == 3, ])) # Call: # lm(formula = Sepal.Length ~ Sepal.Width + Species, # data = iris[iris$Sepal.Width == 3, ]) # Coefficients: # (Intercept) Sepal.Width Speciesversicolor Speciesvirginica # 4.700 NA 1.250 2.017
Solution: There is not enough variation in dependent variable with only one value. So, you need to drop that variable, irrespective of whether that is numeric or character or factor variable.
Updated as per comments: Since you know that the error will only occur with factor/character, you can focus only on those and see whether the length of levels of those factor variables is 1 (DROP) or greater than 1 (NODROP).
To see, whether the variable is a factor or not, use the following code:
(l <- sapply(iris, function(x) is.factor(x))) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # FALSE FALSE FALSE FALSE TRUE
Then you can get the data frame of factor variables only
m <- iris[, l]
Now, find the number of levels of factor variables, if this is one you need to drop that
ifelse(n <- sapply(m, function(x) length(levels(x))) == 1, "DROP", "NODROP")
Note: If the levels of factor variable is only one then that is the variable, you have to drop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With