Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to multiply pandas dataframe with numpy array with broadcasting

I have a dataframe of shape (4, 3) as following:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: x = pd.DataFrame(np.random.randn(4, 3), index=np.arange(4))

In [4]: x
Out[4]: 
          0         1         2
0  0.959322  0.099360  1.116337
1 -0.211405 -2.563658 -0.561851
2  0.616312 -1.643927 -0.483673
3  0.235971  0.023823  1.146727

I want to multiply each column of the dataframe with a numpy array of shape (4,):

In [9]: y = np.random.randn(4)

In [10]: y
Out[10]: array([-0.34125522,  1.21567883, -0.12909408,  0.64727577])

In numpy, the following broadcasting trick works:

In [12]: x.values * y[:, None]
Out[12]: 
array([[-0.32737369, -0.03390716, -0.38095588],
       [-0.25700028, -3.11658448, -0.68303043],
       [-0.07956223,  0.21222123,  0.06243928],
       [ 0.15273815,  0.01541983,  0.74224861]])

However, it doesn't work in the case of pandas dataframe, I get the following error:

In [13]: x * y[:, None]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-21d033742c49> in <module>()
----> 1 x * y[:, None]
...
ValueError: Shape of passed values is (1, 4), indices imply (3, 4)

Any suggestions?

Thanks!

like image 378
Wei Li Avatar asked Aug 12 '15 16:08

Wei Li


People also ask

How do you multiply numbers with NumPy arrays?

You can use np. multiply to multiply two same-sized arrays together. This computes something called the Hadamard product. In the Hadamard product, the two inputs have the same shape, and the output contains the element-wise product of each of the input values.

How do I multiply Panda DataFrame?

Pandas DataFrame mul() MethodThe mul() method multiplies each value in the DataFrame with a specified value. The specified value must be an object that can be multiplied with the values of the DataFrame.

What is broadcasting in NumPy?

numpy.broadcast. The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

What does broadcasting mean in pandas?

So the term broadcasting comes from numpy, simply put it explains the rules of the output that will result when you perform operations between n-dimensional arrays (could be panels, dataframes, series) or scalar values.


1 Answers

I find an alternative way to do the multiplication between pandas dataframe and numpy array.

In [14]: x.multiply(y, axis=0)
Out[14]: 
          0         1         2
0  0.195346  0.443061  1.219465
1  0.194664  0.242829  0.180010
2  0.803349  0.091412  0.098843
3  0.365711 -0.388115  0.018941
like image 125
Wei Li Avatar answered Sep 25 '22 02:09

Wei Li