Considering the example data bellow what is the correct way to perform a matrix multiplication with data that is in Polars?
In []: matrix_1 = pl.DataFrame({"col_1":[1,2,3],"col_2":[4,5,6], "col_3":[7,8,9]})
In []: matrix_2 = pl.DataFrame({"col_1":[9,8,7],"col_2":[6,5,4], "col_3":[3,2,1]})
I've done the following using numpy to perform computation:
In []: np.matmul(matrix_1, matrix_2)
Out[]:
array([[ 30, 24, 18],
[ 84, 69, 54],
[138, 114, 90]])
In []: np.dot(matrix_1, matrix_2)
Out[]:
array([[ 30, 24, 18],
[ 84, 69, 54],
[138, 114, 90]])
I was just wondering if there's a native way to do it to avoid copies because IRL I'm using much more data and if I could have the ergonomy of not having to convert data in and out of numpy this would be great.
P.s.: Another great thing would be able to use the @ to use the __matmult__ that if I'm not mistaken is not implemented in Polars API.
The interoperability of polars with numpy is already pretty strong as per the link @jqurious already posted in comments.
You can also see that interoperability in the fact that you can even use polars dataframes as the input to np.dot.
It seems what you really need/want is a way to do the following while getting back a DataFrame
matrix_1.dot(matrix_2)
shape: (3, 3)
┌───────┬───────┬───────┐
│ col_1 ┆ col_2 ┆ col_3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═══════╪═══════╪═══════╡
│ 30 ┆ 84 ┆ 138 │
│ 24 ┆ 69 ┆ 114 │
│ 18 ┆ 54 ┆ 90 │
└───────┴───────┴───────┘
You can achieve this by making a helper function and then monkey patching it into pl.DataFrame
Just do:
import polars as pl
import numpy as np
def dot(self, rightdf):
return pl.from_numpy(np.dot(self, rightdf), columns=rightdf.columns)
pl.DataFrame.dot=dot
and then when you create your matrix_1 and matrix_2 it will have the method dot built in as above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With