Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running a linear model in R with spreadsheet data

Tags:

r

I have a dataset consisting of 106 individuals of two types - a and b with various variables, for example age and gender. I want to run a linear model which predicts whether each individual is of type a or type b based on the co-variates.

I read in the values for age, gender and the type label for each individual using:

`data = read.xlsx("spreadsheet.xlsx",2, as.is = TRUE)`
age = data$age
gender = data$gender
type = data$type

where each is of the form:

age = [28, 30, 19, 23 etc]
gender = [male, male, female, male etc]
type = [a b b b]

Then I try to set up the model using:

model1 = lm(type ~ age + gender)

but I get this error message:

Warning messages:
1: In model.response(mf, "numeric") :
using type="numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : - not meaningful for factors

I've tried changing the format of type, age and gender using:

age = as.numeric(as.character(age))
gender = as.character(gender)
type = as.character(type)

But this doesn't work!

like image 360
user2846211 Avatar asked Oct 14 '13 15:10

user2846211


People also ask

How do you plot a linear regression in Excel in R?

Charting a Regression in Excel To add a regression line, choose "Add Chart Element" from the "Chart Design" menu. In the dialog box, select "Trendline" and then "Linear Trendline". To add the R2 value, select "More Trendline Options" from the "Trendline menu. Lastly, select "Display R-squared value on chart".


1 Answers

You can't use a linear regression model with a factor as your response variable, which is what you are attempting to do here (type is your response variable). Regression models require numeric response variables. You should instead look at classification models.

As Roland points out, you may wish to start by restating your "type" variable as a logical, binomial variable. Rather than a factor called "type" with two levels "a" and "b", you might create a new variable called "is.type.a", which would contain TRUE or FALSE.

You could then try a logistic regression based on a binomial distribution

model <- glm(is.type.a ~ age + gender,data=data,family="binomial")
like image 112
mac Avatar answered Sep 20 '22 17:09

mac