Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtract first row from all rows in Pandas DataFrame

I have a pandas dataframe:

a = pd.DataFrame(rand(5,6)*10, index=pd.DatetimeIndex(start='2005', periods=5, freq='A'))
a.columns = pd.MultiIndex.from_product([('A','B'),('a','b','c')])

I want to subtract the row a['2005'] from a. To do that I've tried this:

In [22]:

a - a.ix['2005']

Out[22]:
    A   B
    a   b   c   a   b   c
2005-12-31  0   0   0   0   0   0
2006-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2007-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2008-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2009-12-31  NaN     NaN     NaN     NaN     NaN     NaN

Which obviously doesn't work because pandas is lining up the index while doing the operation. This works:

In [24]:

pd.DataFrame(a.values - a['2005'].values, index=a.index, columns=a.columns)

Out[24]:
    A   B
    a   b   c   a   b   c
2005-12-31  0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
2006-12-31  -3.326761   -7.164628   8.188518    -0.863177   0.519587    -3.281982
2007-12-31  3.529531    -4.719756   8.444488    1.355366    7.468361    -4.023797
2008-12-31  3.139185    -8.420257   1.465101    -2.942519   1.219060    -5.146019
2009-12-31  -3.459710   0.519435    -1.049617   -2.779370   4.792227    -1.922461

But I don't want to have to form a new DataFrame every time I have to do this kind of operation. I've tried the apply() method like this: a.apply(lambda x: x-a['2005'].values) but I get ValueError: cannot copy sequence with size 6 to array axis with dimension 5 So I'm not really sure how to proceed. Is there a simple way to do this that I am not seeing? I think there should be an easy way to do this in place so you don't have to construct a new dataframe each time. I also tried the sub() method but the subtraction is only applied to the first row whereas I want to subtract the first row from each row in the dataframe.

like image 979
pbreach Avatar asked Jun 23 '14 16:06

pbreach


People also ask

How do I remove the first few rows from a DataFrame?

Using iloc[] to Drop First N Rows of DataFrameUse DataFrame. iloc[] the indexing syntax [n:] with n as an integer to select the first n rows from pandas DataFrame. For example df. iloc[n:] , substitute n with the integer number specifying how many rows you wanted to delete.

How do you subtract two rows in a data frame?

Example #1: Use subtract() function to subtract each element of a dataframe with a corresponding element in a series.

How do you get the difference in rows in pandas?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.

How do I filter specific rows from a DataFrame pandas?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.


1 Answers

Pandas is great for aligning by index. So when you want Pandas to ignore the index, you need to drop the index. You can do that by converting the DataFrame a.loc['2005'] to a 1-dimensional NumPy array:

In [56]: a - a.loc['2005'].values.squeeze()
Out[56]: 
                   A                             B                    
                   a         b         c         a         b         c
2005-12-31  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
2006-12-31  0.325968  1.314776 -0.789328 -0.344669 -2.518857  7.361711
2007-12-31  0.084203  2.234445 -2.838454 -6.176795 -3.645513  8.955443
2008-12-31  3.798700  0.299529  1.303325 -2.770126 -1.284188  3.093806
2009-12-31  1.520930  2.660040  0.846996 -9.437851 -2.886603  6.705391

The squeeze method converts the NumPy array, a.loc['2005'], of shape to (1, 6) to an array of shape (6,). This allows the array to be broadcasted (during the subtraction) as desired.

like image 103
unutbu Avatar answered Sep 21 '22 16:09

unutbu