Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Least Squares method in practice

Very simple regression task. I have three variables x1, x2, x3 with some random noise. And I know target equation: y = q1*x1 + q2*x2 + q3*x3. Now I want to find target coefs: q1, q2, q3 evaluate the performance using the mean Relative Squared Error (RSE) (Prediction/Real - 1)^2 to evaluate the performance of our prediction methods.

In the research, I see that this is ordinary Least Squares Problem. But I can't get from examples on the internet how to solve this particular problem in Python. Let say I have data:

import numpy as np

sourceData = np.random.rand(1000, 3)
koefs = np.array([1, 2, 3])
target = np.dot(sourceData, koefs)

(In real life that data are noisy, with not normal distribution.) How to find this koefs using Least Squares approach in python? Any lib usage.

like image 808
Brans Ds Avatar asked Jun 16 '17 11:06

Brans Ds


1 Answers

@ayhan made a valuable comment.

And there is a problem with your code: Actually there is no noise in the data you collect. The input data is noisy, but after the multiplication, you don't add any additional noise.

I've added some noise to your measurements and used the least squares formula to fit the parameters, here's my code:

data = np.random.rand(1000,3)

true_theta = np.array([1,2,3])
true_measurements = np.dot(data, true_theta)

noise = np.random.rand(1000) * 1

noisy_measurements = true_measurements + noise

estimated_theta = np.linalg.inv(data.T @ data) @ data.T @ noisy_measurements

The estimated_theta will be close to true_theta. If you don't add noise to the measurements, they will be equal.

I've used the python3 matrix multiplication syntax. You could use np.dot instead of @

That makes the code longer, so I've split the formula:

MTM_inv = np.linalg.inv(np.dot(data.T, data))
MTy = np.dot(data.T, noisy_measurements)
estimated_theta = np.dot(MTM_inv, MTy)

You can read up on least squares here: https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)#The_general_problem

UPDATE:

Or you could just use the builtin least squares function:

np.linalg.lstsq(data, noisy_measurements)
like image 61
lhk Avatar answered Oct 07 '22 20:10

lhk