Tensorflow linear regression result does not match Numpy/SciKit-Learn

Question

I am working through the Tensorflow examples from Aurelien Geron's "Hands-On Machine Learning" book. However, I am unable to replicate the simple linear regression example in this supporting notebook. Why doesn't Tensorflow match the Numpy/SciKit-Learn result?

As far as I can tell, there is no optimization (we are using the normal equation, so it's just matrix computations), and the answers seem too different to be precision errors.

import numpy as np
import tensorflow as tf
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

theta_value

Answer:

array([[ -3.74651413e+01],
       [  4.35734153e-01],
       [  9.33829229e-03],
       [ -1.06622010e-01],
       [  6.44106984e-01],
       [ -4.25131839e-06],
       [ -3.77322501e-03],
       [ -4.26648885e-01],
       [ -4.40514028e-01]], dtype=float32)

###### Compare with pure NumPy

X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)
theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(theta_numpy)

Answer:

[[ -3.69419202e+01]
 [  4.36693293e-01]
 [  9.43577803e-03]
 [ -1.07322041e-01]
 [  6.45065694e-01]
 [ -3.97638942e-06]
 [ -3.78654265e-03]
 [ -4.21314378e-01]
 [ -4.34513755e-01]]

###### Compare with Scikit-Learn

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))

print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])

Answer:

[[ -3.69419202e+01]
 [  4.36693293e-01]
 [  9.43577803e-03]
 [ -1.07322041e-01]
 [  6.45065694e-01]
 [ -3.97638942e-06]
 [ -3.78654265e-03]
 [ -4.21314378e-01]
 [ -4.34513755e-01]]

Update: My question sounded similar to this one, but following the recommendation did not fix the problem.

ChoF · Accepted Answer

I just compare the results by tensorflow and numpy. Since you used dtype=tf.float32 for X and y, I will use np.float32 for the numpy example as follows:

X_numpy = housing_data_plus_bias.astype(np.float32)
y_numpy = housing.target.reshape(-1, 1).astype(np.float32)

Now let's try to compare the results by tf.matmul(XT, X) (tensorflow) and X.T.dot(X) (numpy):

with tf.Session() as sess:
    XTX_value = tf.matmul(XT, X).eval()
XTX_numpy = X_numpy.T.dot(X_numpy)

np.allclose(XTX_value, XTX_numpy, rtol=1e-06) # True
np.allclose(XTX_value, XTX_numpy, rtol=1e-07) # False

So this is the problem of float's precision. If you change the precision to tf.float64 and np.float64, you will have the same result for theta.

Tensorflow linear regression result does not match Numpy/SciKit-Learn

Tags:

python

numpy

tensorflow

scikit-learn

linear-regression

atkat12

1 Answers

ChoF

Recent Activity

Donate For Us

Tensorflow linear regression result does not match Numpy/SciKit-Learn

Tags:

python

numpy

tensorflow

scikit-learn

linear-regression

atkat12

1 Answers

ChoF

Related questions

Recent Activity

Donate For Us