Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run a regression on certain parts of a data frame and extract estimates + errors

I try to run a several regressions on a selected part of a data frame. There are 22 columns. One is "DATE", one is "INDEX" and S1, S2, S3 ... S20.

I run the regression this way:

Regression <- lm(as.matrix(df[c('S1', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'S10', 'S11', 'S12', 'S13', 'S14', 'S15', 'S16', 'S17', 'S18', 'S19', 'S20')]) ~ df$INDEX)
Regression$coefficients

1) How can I make the code shorter? Just like using an interval to tell R: take columns S1 to S20 as explanatory variables and run the regression on them with the dependent variable INDEX.

2) Regression Formula is: a + b*INDEX + error Then extract all the "b" estimates from the regression. Lets say the columns have 10 rows, so there must be 10 estimates. Also extract all the errors: that must be 10 errors in each column, and a total of 10*20=200 errors in total.

Since I have no experience with R, all kind of help is welcome! Thank you!

like image 679
Consti Avatar asked Mar 09 '23 01:03

Consti


2 Answers

If you have 22 columns, just use position of the columns in the dataframe. Using the same dataset as LAP in his answer:

# load iris dataset
date(iris)
# run regression
Regression <- lm(as.matrix(iris[1:3]) ~ Petal.Width, data = iris)

This would, in your case, translate to something like:

# run the regression
Regression <- lm(as.matrix(df[3:22]) ~ INDEX, data = df)

Assuming your dependent variables are in columns 3 to 22 (and 1st column is date, second is index, or something like that)

like image 64
ira Avatar answered Mar 10 '23 13:03

ira


You could shorten your code substantially by using paste() instead of manually writing out all your column names:

Regression <- lm(as.matrix(df[paste0("S", 1:20)]) ~ df$INDEX)

To access the regression estimates, use Regression$fitted.values. For the errors, use Regression$residuals.

Example using the iris data:

data(iris)
Regression <- lm(Sepal.Length + Sepal.Width ~ Petal.Length, data = iris)

head(Regression$fitted.values)
  Sepal.Length Sepal.Width
1     4.879095    3.306775
2     4.879095    3.306775
3     4.838202    3.317354
4     4.919987    3.296197
5     4.879095    3.306775
6     5.001771    3.275039

head(Regression$residuals)
  Sepal.Length Sepal.Width
1    0.2209054   0.1932249
2    0.0209054  -0.3067751
3   -0.1382024  -0.1173536
4   -0.3199868  -0.1961965
5    0.1209054   0.2932249
6    0.3982287   0.6249605
like image 26
LAP Avatar answered Mar 10 '23 14:03

LAP