Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In-place row-wise operation on pandas DataFrame

Tags:

python

pandas

Suppose I have this:

>>> x = pandas.DataFrame([[1.0, 2.0, 3.0], [3, 4, 5]], columns=["A", "B", "C"])
>>> print x
   A  B  C
0  1  2  3
1  3  4  5

Now I want to normalize x by row --- that is, divide each row by its sum. As described in this question, this can be achieved with x = x.div(x.sum(axis=1), axis=0). However, this creates a new DataFrame. If my DataFrame is large, a lot of memory can be consumed in creating this new DataFrame, even though I immediately assign it to the original name.

Is there an efficient way to perform this operation in place? I want something like x.idiv() that provides the axis option of div but updates x in place. For this specific case I need the division, but sometimes it would also be nice to have similar in-place versions for all the basic operations.

(I can update it in place by iterating over it row-wise and assigning each normalized row back into the original, but this is slow, and I'm looking for a more efficient solution.)

like image 995
BrenBarn Avatar asked Nov 08 '13 07:11

BrenBarn


1 Answers

You can do this directly in numpy (without creating a copy):

In [11]: x1 = x.values.T

In [12]: x1
Out[12]: 
array([[ 1.,  3.],
       [ 2.,  4.],
       [ 3.,  5.]])

In [13]: x1 /= x1.sum(0)

In [14]: x
Out[14]: 
          A         B         C
0  0.166667  0.333333  0.500000
1  0.250000  0.333333  0.416667

Perhaps there ought to be an inplace flag for div...?

like image 170
Andy Hayden Avatar answered Nov 17 '22 15:11

Andy Hayden