Subtract first row from all rows in Pandas DataFrame

Tags:

I have a pandas dataframe:

a = pd.DataFrame(rand(5,6)*10, index=pd.DatetimeIndex(start='2005', periods=5, freq='A'))
a.columns = pd.MultiIndex.from_product([('A','B'),('a','b','c')])

I want to subtract the row a['2005'] from a. To do that I've tried this:

In [22]:

a - a.ix['2005']

Out[22]:
    A   B
    a   b   c   a   b   c
2005-12-31  0   0   0   0   0   0
2006-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2007-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2008-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2009-12-31  NaN     NaN     NaN     NaN     NaN     NaN

Which obviously doesn't work because pandas is lining up the index while doing the operation. This works:

In [24]:

pd.DataFrame(a.values - a['2005'].values, index=a.index, columns=a.columns)

Out[24]:
    A   B
    a   b   c   a   b   c
2005-12-31  0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
2006-12-31  -3.326761   -7.164628   8.188518    -0.863177   0.519587    -3.281982
2007-12-31  3.529531    -4.719756   8.444488    1.355366    7.468361    -4.023797
2008-12-31  3.139185    -8.420257   1.465101    -2.942519   1.219060    -5.146019
2009-12-31  -3.459710   0.519435    -1.049617   -2.779370   4.792227    -1.922461

But I don't want to have to form a new DataFrame every time I have to do this kind of operation. I've tried the apply() method like this: a.apply(lambda x: x-a['2005'].values) but I get ValueError: cannot copy sequence with size 6 to array axis with dimension 5 So I'm not really sure how to proceed. Is there a simple way to do this that I am not seeing? I think there should be an easy way to do this in place so you don't have to construct a new dataframe each time. I also tried the sub() method but the subtraction is only applied to the first row whereas I want to subtract the first row from each row in the dataframe.

979

asked Jun 23 '14 16:06

pbreach

1 Answers

Pandas is great for aligning by index. So when you want Pandas to ignore the index, you need to drop the index. You can do that by converting the DataFrame a.loc['2005'] to a 1-dimensional NumPy array:

In [56]: a - a.loc['2005'].values.squeeze()
Out[56]: 
                   A                             B                    
                   a         b         c         a         b         c
2005-12-31  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
2006-12-31  0.325968  1.314776 -0.789328 -0.344669 -2.518857  7.361711
2007-12-31  0.084203  2.234445 -2.838454 -6.176795 -3.645513  8.955443
2008-12-31  3.798700  0.299529  1.303325 -2.770126 -1.284188  3.093806
2009-12-31  1.520930  2.660040  0.846996 -9.437851 -2.886603  6.705391

The squeeze method converts the NumPy array, a.loc['2005'], of shape to (1, 6) to an array of shape (6,). This allows the array to be broadcasted (during the subtraction) as desired.

103

answered Sep 21 '22 16:09

unutbu

Related questions
                            
                                Loop through netcdf files and run calculations - Python or R
                            
                                How to remove items from a list that contains words found in items in another list [duplicate]
                            
                                Construct two dimensional numpy array from indices and values of a one dimensional array
                            
                                How does Flask-login with multiple servers work
                            
                                How do I test for null list entry in python list
                            
                                Why do python programs run very slow the first time?
                            
                                In Python, how do I determine if an object still exists?
                            
                                Django, Detecting if variable is a number
                            
                                Python-crontab package returns __init__ error
                            
                                kNN with big sparse matrices in Python
                            
                                How to reference a ModelView in flask-admin
                            
                                python no module named serial
                            
                                TypeError: argument of type 'char const *'
                            
                                Empty a variable without destroying it
                            
                                frozenset() - Example of when one might use them
                            
                                Python TypeError: unsupported operand type(s) for /: 'NoneType' and 'float'
                            
                                How to fix issue with the removed cssselect package in lxml?
                            
                                Disable or lock text edit PyQT
                            
                                Multiprocess multiple files in a list
                            
                                Masking multiple columns on a pandas dataframe in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Subtract first row from all rows in Pandas DataFrame

Tags:

python

pandas

dataframe

numpy

pbreach

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us