Finding non-linear correlations in R

Question

I have about 90 variables stored in data[2-90]. I suspect about 4 of them will have a parabola-like correlation with data[1]. I want to identify which ones have the correlation. Is there an easy and quick way to do this?

I have tried building a model like this (which I could do in a loop for each variable i = 2:90):

y <- data$AvgRating
x <- data$Hamming.distance
x2 <- x^2

quadratic.model = lm(y ~ x + x2)

And then look at the R^2/coefficient to get an idea of the correlation. Is there a better way of doing this?

Maybe R could build a regression model with the 90 variables and chose the ones which are significant itself? Would that be in any way possible? I can do this in JMP for linear regression, but I'm not sure I could do non-linear regression with R for all the variables at ones. Therefore I was manually trying to see if I could see which ones are correlated in advance. It would be helpful if there was a function to use for that.

vahab najari · Accepted Answer

You can use nlcor package in R. This package finds the nonlinear correlation between two data vectors. There are different approaches to estimate a nonlinear correlation, such as infotheo. However, nonlinear correlations between two variables can take any shape.

nlcor is robust to most nonlinear shapes. It works pretty well in different scenarios.

At a high level, nlcor works by adaptively segmenting the data into linearly correlated segments. The segment correlations are aggregated to yield the nonlinear correlation. The output is a number between 0 to 1. With close to 1 meaning high correlation. Unlike a pearson correlation, negative values are not returned because it has no meaning in nonlinear relationships.

More details about this package here

To install nlcor, follow these steps:

install.packages("devtools") 
library(devtools)
install_github("ProcessMiner/nlcor")
library(nlcor)

After you install it,

# Implementation 
x <- seq(0,3*pi,length.out=100)
y <- sin(x)
plot(x,y,type="l")

sin(x) plot

# linear correlation is small
cor(x,y)
# [1] 6.488616e-17
# nonlinear correlation is more representative
nlcor(x,y, plt = T)
# $cor.estimate
# [1] 0.9774
# $adjusted.p.value
# [1] 1.586302e-09
# $cor.plot

using nlcor for sin(x)

As shown in the example the linear correlation was close to zero although there was a clear relationship between the variables that nlcor could detect.

Note: The order of x and y inside the nlcor is important. nlcor(x,y) is different from nlcor(y,x). The x and y here represent 'independent' and 'dependent' variables, respectively.

George Dontas · Answer

Fitting a generalized additive model, will help you identify curvature in the relationships between the explanatory variables. Read the example on page 22 here.

Finding non-linear correlations in R

Tags:

r

regression

non-linear-regression

dorien

Video Answer

2 Answers

vahab najari

George Dontas

Recent Activity

Donate For Us

Finding non-linear correlations in R

Tags:

r

regression

non-linear-regression

dorien

Video Answer

2 Answers

vahab najari

George Dontas

Related questions

Recent Activity

Donate For Us