Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient Multiple Linear Regression in C# / .Net

Does anyone know of an efficient way to do multiple linear regression in C#, where the number of simultaneous equations may be in the 1000's (with 3 or 4 different inputs). After reading this article on multiple linear regression I tried implementing it with a matrix equation:

Matrix y = new Matrix(
    new double[,]{{745},
                  {895},
                  {442},
                  {440},
                  {1598}});

Matrix x = new Matrix(
     new double[,]{{1, 36, 66},
                 {1, 37, 68},
                 {1, 47, 64},
                 {1, 32, 53},
                 {1, 1, 101}});

Matrix b = (x.Transpose() * x).Inverse() * x.Transpose() * y;

for (int i = 0; i < b.Rows; i++)
{
  Trace.WriteLine("INFO: " + b[i, 0].ToDouble());
}

However it does not scale well to the scale of 1000's of equations due to the matrix inversion operation. I can call the R language and use that, however I was hoping there would be a pure .Net solution which will scale to these large sets.

Any suggestions?

EDIT #1:

I have settled using R for the time being. By using statconn (downloaded here) I have found it to be both fast & relatively easy to use this method. I.e. here is a small code snippet, it really isn't much code at all to use the R statconn library (note: this is not all the code!).

_StatConn.EvaluateNoReturn(string.Format("output <- lm({0})", equation));
object intercept = _StatConn.Evaluate("coefficients(output)['(Intercept)']");
parameters[0] = (double)intercept;
for (int i = 0; i < xColCount; i++)
{
  object parameter = _StatConn.Evaluate(string.Format("coefficients(output)['x{0}']", i));
  parameters[i + 1] = (double)parameter;
}
like image 391
mike Avatar asked May 26 '10 05:05

mike


3 Answers

For the record, I recently found the ALGLIB library which, whilst not having much documentation, has some very useful functions such as the linear regression which is one of the things I was after.

Sample code (this is old and unverified, just a basic example of how I was using it). I was using the linear regression on time series with 3 entries (called 3min/2min/1min) and then the finishing value (Final).

public void Foo(List<Sample> samples)
{
  int nAttributes = 3; // 3min, 2min, 1min
  int nSamples = samples.Count;
  double[,] tsData = new double[nSamples, nAttributes];
  double[] resultData = new double[nSamples];

  for (int i = 0; i < samples.Count; i++)
  {
    tsData[i, 0] = samples[i].Tminus1min;
    tsData[i, 1] = samples[i].Tminus2min;
    tsData[i, 2] = samples[i].Tminus3min;

    resultData[i] = samples[i].Final;
  }

  double[] weights = null;
  int fitResult = 0;
  alglib.lsfit.lsfitreport rep = new alglib.lsfit.lsfitreport();
  alglib.lsfit.lsfitlinear(resultData, tsData, nSamples, nAttributes, ref fitResult, ref weights, rep);

  Dictionary<string, double> labelsAndWeights = new Dictionary<string, double>();
  labelsAndWeights.Add("1min", weights[0]);
  labelsAndWeights.Add("2min", weights[1]);
  labelsAndWeights.Add("3min", weights[2]);
}
like image 149
mike Avatar answered Sep 18 '22 20:09

mike


The size of the matrix being inverted does NOT grow with the number of simultaneous equations (samples). x.Transpose() * x is a square matrix where the dimension is the number of independent variables.

like image 26
Joe H Avatar answered Sep 21 '22 20:09

Joe H


Try Meta.Numerics:

Meta.Numerics is a library for advanced scientific computation in the .NET Framework. It can be used from C#, Visual Basic, F#, or any other .NET programming language. The Meta.Numerics library is fully object-oriented and optimized for speed of implementation and execution.

To populate a matrix, see an example of the ColumnVector Constructor (IList<Double>). It can construct a ColumnVector from many ordered collections of reals, including double[] and List.

like image 41
gimel Avatar answered Sep 18 '22 20:09

gimel