I am having a LOT of trouble figuring out how to find predicted values from a coefficients of models and the model matrix. I'm hoping someone can help. I currently have a linear model with two independent variables that I am setting up. e.g. <pre class="prettyprint"><code>data <- data.frame(d1,d2,d3) lm.data <- lm(d1~d2*d3,data) </code></pre> I can now get the coefficient vector <pre class="prettyprint"><code>co.data <- coef(lm.data) </code></pre> I can also now easily get the model matrix <pre class="prettyprint"><code>mm.data <- model.matrix(lm.data) </code></pre> This is where I can lost!! I am trying to teach myself how I can match the values I can when I use <code>predict(lm.data)</code> with the coefficients. In other words, I know the predicted values of the model from the design matrix and coefficients can be calculated, but after the past 48 hours of working on this, I truly have no idea. Any help would be amazing.

You just need to know how a linear model works. If your formula is <code>d1 ~ d2 * d3</code> and they're all numeric, then to predict you just do <code>(intercept) + (d2 coefficient)*x_d2 + (d3 coefficient)*x_d3 + (d2:d3 coefficient)*x_d2*x_d3</code> and that will give you the predicted <code>d1</code>. here's a reproducible example: <pre class="prettyprint"><code>data(iris) m <- lm(Sepal.Length ~ Petal.Length * Sepal.Width, iris) co.data <- coef(m) # we'll predict the sepal length for these petal lengths and sepal widths: x.pl <- runif(5, min=1, max=2) x.sw <- runif(5, min=2, max=5) y.predicted <- predict(m, data.frame(Petal.Length=x.pl, Sepal.Width=x.sw)) # 1 2 3 4 5 # 5.379006 5.495907 5.296913 4.382487 5.131850 </code></pre> Now to do it manually, let's look at the coefficients: <pre class="prettyprint"><code>co.data # Intercept) Petal.Length Sepal.Width Petal.Length:Sepal.Width # 1.40438275 0.71845958 0.84995691 -0.07701327 </code></pre> According to the formula above: <pre class="prettyprint"><code>y <- co.data[1] + co.data[2]*x.pl + co.data[3] * x.sw + co.data[4]*x.pl*x.sw # [1] 5.379006 5.495907 5.296913 4.382487 5.131850 </code></pre> Rather than writing it out manually you can do something like: <pre class="prettyprint"><code># x is a matrix with columns 1, petal length, sepal width, pl*sw # (matches order of co.data) x <- cbind(1, matrix(c(x.pl, x.sw, x.pl*x.sw), ncol=3)) x %*% co.data # [,1] # [1,] 5.379006 # [2,] 5.495907 # [3,] 5.296913 # [4,] 4.382487 # [5,] 5.131850 </code></pre>

How Can I manually obtain predict() values from coef/model.matrix returns on linear model

Tags:

r

I am having a LOT of trouble figuring out how to find predicted values from a coefficients of models and the model matrix. I'm hoping someone can help.

I currently have a linear model with two independent variables that I am setting up. e.g.

data <- data.frame(d1,d2,d3)
lm.data <- lm(d1~d2*d3,data)

I can now get the coefficient vector

co.data <- coef(lm.data)

I can also now easily get the model matrix

mm.data <- model.matrix(lm.data)

This is where I can lost!! I am trying to teach myself how I can match the values I can when I use predict(lm.data) with the coefficients. In other words, I know the predicted values of the model from the design matrix and coefficients can be calculated, but after the past 48 hours of working on this, I truly have no idea.

Any help would be amazing.

643

asked Jul 15 '15 00:07

Sean

1 Answers

You just need to know how a linear model works. If your formula is d1 ~ d2 * d3 and they're all numeric, then to predict you just do (intercept) + (d2 coefficient)*x_d2 + (d3 coefficient)*x_d3 + (d2:d3 coefficient)*x_d2*x_d3 and that will give you the predicted d1.

here's a reproducible example:

data(iris)
m <- lm(Sepal.Length ~ Petal.Length * Sepal.Width, iris)
co.data <- coef(m)

# we'll predict the sepal length for these petal lengths and sepal widths:
x.pl <- runif(5, min=1, max=2)
x.sw <- runif(5, min=2, max=5)
y.predicted <-  predict(m, data.frame(Petal.Length=x.pl, Sepal.Width=x.sw)) 
#        1        2        3        4        5 
# 5.379006 5.495907 5.296913 4.382487 5.131850

Now to do it manually, let's look at the coefficients:

co.data
# Intercept)             Petal.Length              Sepal.Width Petal.Length:Sepal.Width 
# 1.40438275               0.71845958               0.84995691              -0.07701327

According to the formula above:

y <- co.data[1] + co.data[2]*x.pl + co.data[3] * x.sw + co.data[4]*x.pl*x.sw
# [1] 5.379006 5.495907 5.296913 4.382487 5.131850

Rather than writing it out manually you can do something like:

# x is a matrix with columns 1, petal length, sepal width, pl*sw
# (matches order of co.data)
x <- cbind(1, matrix(c(x.pl, x.sw, x.pl*x.sw), ncol=3))
x %*% co.data
#          [,1]
# [1,] 5.379006
# [2,] 5.495907
# [3,] 5.296913
# [4,] 4.382487
# [5,] 5.131850

101

answered Nov 03 '22 15:11

mathematical.coffee

Related questions
                            
                                Counting variables in a formula
                            
                                `rowname`-ing a list of matrices
                            
                                package ‘diamonds’ is not available (for R version 3.0.0) [duplicate]
                            
                                Need the filename of the Rmd when knitr runs
                            
                                Fill Geospatial polygons with pattern - R
                            
                                remove all words that start with "@" from a string
                            
                                Error: No Such Column using SQLDF
                            
                                How to edit colnames in R?
                            
                                How can I plot 3D function in r? [duplicate]
                            
                                Rolling Standard Deviation in a Matrix in R
                            
                                How to measure area between 2 distribution curves in R / ggplot2
                            
                                Using the result of summarise (dplyr) to mutate the original dataframe
                            
                                regex for preserving case pattern, capitalization
                            
                                Sleeping shinyapp on shinyapps.io
                            
                                How to match data from two tables with same primary key in R
                            
                                How can I write special characters in RMarkdown latex documents?
                            
                                Difference between runif and sample in R?
                            
                                How exactly are outliers removed in R boxplot and how can the same outliers be removed for further calculation (e.g. mean)?
                            
                                tm custom removePunctuation except hashtag
                            
                                How to merge two dataframes using multiple columns as key?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With