I have a dataframe of shape (4, 3) as following:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: x = pd.DataFrame(np.random.randn(4, 3), index=np.arange(4))
In [4]: x
Out[4]:
0 1 2
0 0.959322 0.099360 1.116337
1 -0.211405 -2.563658 -0.561851
2 0.616312 -1.643927 -0.483673
3 0.235971 0.023823 1.146727
I want to multiply each column of the dataframe with a numpy array of shape (4,):
In [9]: y = np.random.randn(4)
In [10]: y
Out[10]: array([-0.34125522, 1.21567883, -0.12909408, 0.64727577])
In numpy, the following broadcasting trick works:
In [12]: x.values * y[:, None]
Out[12]:
array([[-0.32737369, -0.03390716, -0.38095588],
[-0.25700028, -3.11658448, -0.68303043],
[-0.07956223, 0.21222123, 0.06243928],
[ 0.15273815, 0.01541983, 0.74224861]])
However, it doesn't work in the case of pandas dataframe, I get the following error:
In [13]: x * y[:, None]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-21d033742c49> in <module>()
----> 1 x * y[:, None]
...
ValueError: Shape of passed values is (1, 4), indices imply (3, 4)
Any suggestions?
Thanks!
You can use np. multiply to multiply two same-sized arrays together. This computes something called the Hadamard product. In the Hadamard product, the two inputs have the same shape, and the output contains the element-wise product of each of the input values.
Pandas DataFrame mul() MethodThe mul() method multiplies each value in the DataFrame with a specified value. The specified value must be an object that can be multiplied with the values of the DataFrame.
numpy.broadcast. The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
So the term broadcasting comes from numpy, simply put it explains the rules of the output that will result when you perform operations between n-dimensional arrays (could be panels, dataframes, series) or scalar values.
I find an alternative way to do the multiplication between pandas dataframe and numpy array.
In [14]: x.multiply(y, axis=0)
Out[14]:
0 1 2
0 0.195346 0.443061 1.219465
1 0.194664 0.242829 0.180010
2 0.803349 0.091412 0.098843
3 0.365711 -0.388115 0.018941
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With