Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: how to perform more complex calculations from a combn of a dataset?

Tags:

loops

r

combn

Right now, I have a combn from the built in dataset iris. So far, I have been guided into being able to find the coefficient of lm() of the pair of values.

myPairs <- combn(names(iris[1:4]), 2)

formula <- apply(myPairs, MARGIN=2, FUN=paste, collapse="~")

model <- lapply(formula, function(x) lm(formula=x, data=iris)$coefficients[2])

model

However, I would like to go a few steps further and use the coefficient from lm() to be used in further calculations. I would like to do something like this:

Coefficient <- lm(formula=x, data=iris)$coefficients[2]
Spread <- myPairs[1] - coefficient*myPairs[2]
library(tseries)
adf.test(Spread)

The procedure itself is simple enough, but I haven't been able to find a way to do this for each combn in the data set. (As a sidenote, the adf.test would not be applied to such data, but I'm just using the iris dataset for demonstration). I'm wondering, would it be better to write a loop for such a procedure?

like image 671
Luke Zhang Avatar asked Jun 15 '16 17:06

Luke Zhang


1 Answers

You can do all of this within combn.

If you just wanted to run the regression over all combinations, and extract the second coefficient you could do

fun <- function(x) coef(lm(paste(x, collapse="~"), data=iris))[2]
combn(names(iris[1:4]), 2, fun)

You can then extend the function to calculate the spread

fun <- function(x) {
         est <- coef(lm(paste(x, collapse="~"), data=iris))[2]
         spread <- iris[,x[1]] - est*iris[,x[2]]
         adf.test(spread)
        }

out <- combn(names(iris[1:4]), 2, fun, simplify=FALSE)
out[[1]]

#   Augmented Dickey-Fuller Test

#data:  spread
#Dickey-Fuller = -3.879, Lag order = 5, p-value = 0.01707
#alternative hypothesis: stationary

Compare results to running the first one manually

est <- coef(lm(Sepal.Length ~ Sepal.Width, data=iris))[2]
spread <- iris[,"Sepal.Length"] - est*iris[,"Sepal.Width"]
adf.test(spread)

#   Augmented Dickey-Fuller Test

# data:  spread
# Dickey-Fuller = -3.879, Lag order = 5, p-value = 0.01707
# alternative hypothesis: stationary
like image 77
user20650 Avatar answered Oct 11 '22 15:10

user20650