Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why I am getting matrices are not aligned error for DataFrame dot function?

I am trying to implement simple linear regression in Python using Numpy and Pandas. But I am getting a ValueError: matrices are not aligned error for calling the dot function which essentially calculates the matrix multiplication as the documentation says. Following is the code snippet:

import numpy as np
import pandas as pd

#initializing the matrices for X, y and theta
#dataset = pd.read_csv("data1.csv")
dataset = pd.DataFrame([[6.1101,17.592],[5.5277,9.1302],[8.5186,13.662],[7.0032,11.854],[5.8598,6.8233],[8.3829,11.886],[7.4764,4.3483],[8.5781,12]])
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
X.insert(0, "x_zero", np.ones(X.size), True)
print(X)
print(f"\n{y}")
theta = pd.DataFrame([[0],[1]])
temp = pd.DataFrame([[1],[1]])
print(X.shape)
print(theta.shape)
print(X.dot(theta))

And this is the output for the same:

   x_zero       0
0     1.0  6.1101
1     1.0  5.5277
2     1.0  8.5186
3     1.0  7.0032
4     1.0  5.8598
5     1.0  8.3829
6     1.0  7.4764
7     1.0  8.5781

0    17.5920
1     9.1302
2    13.6620
3    11.8540
4     6.8233
5    11.8860
6     4.3483
7    12.0000
Name: 1, dtype: float64
(8, 2)
(2, 1)
Traceback (most recent call last):
  File "linear.py", line 16, in <module>
    print(X.dot(theta))
  File "/home/tejas/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 1063, in dot
    raise ValueError("matrices are not aligned")
ValueError: matrices are not aligned

As you can see the output of shape attributes for both of them, the second axis has same dimension (2) and dot function should return a 8*1 DataFrame. Then, why the error?

like image 750
Tejas Joshi Avatar asked Sep 11 '25 02:09

Tejas Joshi


1 Answers

This misalignment is not a one coming from shapes, but the one coming from pandas indexes. You have 2 options to fix your problem:

Tweak theta assignment:

theta = pd.DataFrame([[0],[1]], index=X.columns)

So the indexes you multiply will match.

Remove indexes relevancy, by moving second df to numpy:

X.dot(theta.to_numpy())

This functionality is actually useful in pandas - that it tries to match smart the indexes, your case is just the quite specific one, when it becomes counterproductive ;)

like image 60
Grzegorz Skibinski Avatar answered Sep 13 '25 15:09

Grzegorz Skibinski