I performed a regression analyses in R on some dataset and try to predict the contribution of each individual independent variable on the dependent variable for each row in the dataset. So something like this: <pre class="prettyprint"><code>set.seed(123) y <- rnorm(10) m <- data.frame(v1=rnorm(10), v2=rnorm(10), v3=rnorm(10)) regr <- lm(formula=y~v1+v2+v3, data=m) summary(regr) terms <- predict.lm(regr,m, type="terms") </code></pre> In short: run a regression and use the predict function to calculate the terms of v1,v2 and v3 in dataset m. But I am having a hard time understanding what the predict function is calculating. I would expect it multiplies the coefficient of the regression result with the variable data. So something like this for v1: <pre class="prettyprint"><code>coefficients(regr)[2]*m$v1 </code></pre> But that gives different results compared to the predict function. Own calculation: <pre class="prettyprint"><code>0.55293884 0.16253411 0.18103537 0.04999729 -0.25108302 0.80717945 0.22488764 -0.88835486 0.31681455 -0.21356803 </code></pre> And predict function calculation: <pre class="prettyprint"><code>0.45870070 0.06829597 0.08679724 -0.04424084 -0.34532115 0.71294132 0.13064950 -0.98259299 0.22257641 -0.30780616 </code></pre> The prediciton function is of by 0.1 or so Also if you add all terms in the prediction function together with the constant it doesn’t add up to the total prediction (using type=”response”). What does the prediction function calculate here and how can I tell it to calculate what I did with coefficients(regr)[2]*m$v1?

All the following lines result in the same predictions: <pre class="prettyprint"><code># our computed predictions coefficients(regr)[1] + coefficients(regr)[2]*m$v1 + coefficients(regr)[3]*m$v2 + coefficients(regr)[4]*m$v3 # prediction using predict function predict.lm(regr,m) # prediction using terms matrix, note that we have to add the constant. terms_predict = predict.lm(regr,m, type="terms") terms_predict[,1]+terms_predict[,2]+terms_predict[,3]+attr(terms_predict,'constant') </code></pre> You can read more about using <code>type="terms"</code> here. The reason that your own calculation (<code>coefficients(regr)[2]*m$v1</code>) and the predict function calculation (<code>terms_predict[,1]</code>) are different is because the columns in the terms matrix are centered around the mean, so their mean becomes zero: <pre class="prettyprint"><code># this is equal to terms_predict[,1] coefficients(regr)[2]*m$v1-mean(coefficients(regr)[2]*m$v1) # indeed, all columns are centered; i.e. have a mean of 0. round(sapply(as.data.frame(terms_predict),mean),10) </code></pre> Hope this helps.

Individual terms in prediction of linear regression

Tags:

r

linear-regression

prediction

I performed a regression analyses in R on some dataset and try to predict the contribution of each individual independent variable on the dependent variable for each row in the dataset.

So something like this:

set.seed(123)                                              
y <- rnorm(10)                                           
m <- data.frame(v1=rnorm(10), v2=rnorm(10), v3=rnorm(10))
regr <- lm(formula=y~v1+v2+v3, data=m)  
summary(regr)
terms <- predict.lm(regr,m, type="terms")

In short: run a regression and use the predict function to calculate the terms of v1,v2 and v3 in dataset m. But I am having a hard time understanding what the predict function is calculating. I would expect it multiplies the coefficient of the regression result with the variable data. So something like this for v1:

coefficients(regr)[2]*m$v1

But that gives different results compared to the predict function.

Own calculation:

0.55293884  0.16253411  0.18103537  0.04999729 -0.25108302  0.80717945  0.22488764 -0.88835486  0.31681455 -0.21356803

And predict function calculation:

0.45870070  0.06829597  0.08679724 -0.04424084 -0.34532115  0.71294132  0.13064950 -0.98259299  0.22257641 -0.30780616

The prediciton function is of by 0.1 or so Also if you add all terms in the prediction function together with the constant it doesn’t add up to the total prediction (using type=”response”). What does the prediction function calculate here and how can I tell it to calculate what I did with coefficients(regr)[2]*m$v1?

970

asked Dec 17 '17 09:12

Tall Measure

1 Answers

All the following lines result in the same predictions:

# our computed predictions
coefficients(regr)[1] + coefficients(regr)[2]*m$v1 +
  coefficients(regr)[3]*m$v2 + coefficients(regr)[4]*m$v3

# prediction using predict function
predict.lm(regr,m)

# prediction using terms matrix, note that we have to add the constant.
terms_predict = predict.lm(regr,m, type="terms")
terms_predict[,1]+terms_predict[,2]+terms_predict[,3]+attr(terms_predict,'constant')

You can read more about using type="terms" here.

The reason that your own calculation (coefficients(regr)[2]*m$v1) and the predict function calculation (terms_predict[,1]) are different is because the columns in the terms matrix are centered around the mean, so their mean becomes zero:

# this is equal to terms_predict[,1]
coefficients(regr)[2]*m$v1-mean(coefficients(regr)[2]*m$v1)

# indeed, all columns are centered; i.e. have a mean of 0.
round(sapply(as.data.frame(terms_predict),mean),10)

Hope this helps.

178

answered Sep 27 '22 21:09

Florian

Related questions
                            
                                Reorder factor levels based on another factor
                            
                                R error : "attempt to select less than one element in get1index"
                            
                                R wordcloud2 letterCloud showing only the background
                            
                                plm Package in R - empty model when including only variables without variation over time per individual
                            
                                Tidy evaluation when column names are stored in strings
                            
                                extra variables in legend when wrapping ggplot2 in plotly R
                            
                                Can I run a Shiny app from within R Tools for Visual Studio
                            
                                Plotting sfc_POLYGON in leaflet
                            
                                Handling of closures in data.table
                            
                                How can I find a dataset that has some specific attributes? [duplicate]
                            
                                Understanding TSA::periodogram()
                            
                                Remove exact rows and frequency of rows of a data.frame that are in another data.frame in r
                            
                                how do i control geom_errorbar width by symbol size?
                            
                                What is the most efficient kmeans clustering package in R?
                            
                                Conditional join in data.table?
                            
                                How can leaflet pop-up labels be formatted in R?
                            
                                Rvest html_table error - Error in out[j + k, ] : subscript out of bounds
                            
                                ERROR: An error has occurred. Check your logs or contact the app author for clarification
                            
                                R: Converting "special" letters into UTF-8?
                            
                                Adjust the size of an embedded Shiny App within Rmarkdown document

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With