Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does pandas' sub operator do?

Tags:

python

pandas

This is coming straight from the tutorial, which I can't understand even after reading the doc.

In [14]: df = DataFrame({'one' : Series(randn(3), index=['a', 'b', 'c']),
   ....:                 'two' : Series(randn(4), index=['a', 'b', 'c', 'd']),
   ....:                 'three' : Series(randn(3), index=['b', 'c', 'd'])})
   ....: 

In [15]: df
Out[15]: 
        one     three       two
a -0.626544       NaN -0.351587
b -0.138894 -0.177289  1.136249
c  0.011617  0.462215 -0.448789
d       NaN  1.124472 -1.101558

In [16]: row = df.ix[1]

In [17]: column = df['two']

In [18]: df.sub(row, axis='columns')
Out[18]: 
        one     three       two
a -0.487650       NaN -1.487837
b  0.000000  0.000000  0.000000
c  0.150512  0.639504 -1.585038
d       NaN  1.301762 -2.237808

Why does the second row turn into 0? Is it being sub-stituted with 0?

Also, when I use row = df.ix[0], the entire second column turns into NaN. Why?

like image 765
Heisenberg Avatar asked Jun 23 '26 17:06

Heisenberg


1 Answers

sub means subtract, so lets walk through this:

In [44]:
# create some data
df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
                    'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
                    'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
df
Out[44]:
        one     three       two
a -1.536737       NaN  1.537104
b  1.486947 -0.429089 -0.227643
c  0.219609 -0.178037 -1.118345
d       NaN  1.254126 -0.380208
In [45]:
# take a copy of 2nd row
row = df.ix[1]
row
Out[45]:
one      1.486947
three   -0.429089
two     -0.227643
Name: b, dtype: float64
In [46]:
# now subtract the 2nd row row-wise
df.sub(row, axis='columns')
Out[46]:
        one     three       two
a -3.023684       NaN  1.764747
b  0.000000  0.000000  0.000000
c -1.267338  0.251052 -0.890702
d       NaN  1.683215 -0.152565

So probably what is confusing you is what is happening when you specified 'columns' as the axis to operate on. We've subtracted from each row the value of the 2nd row, this explains why the 2nd row has now become all 0's. The data you've passed is a series and we're aligning on column's so in effect we're aligning against the column names which is why it's performed row-wise

In [47]:
# now take a copy of the first row
row = df.ix[0]
row
Out[47]:
one     -1.536737
three         NaN
two      1.537104
Name: a, dtype: float64
In [48]:
# perform the same op
df.sub(row, axis='columns')
Out[48]:
        one  three       two
a  0.000000    NaN  0.000000
b  3.023684    NaN -1.764747
c  1.756346    NaN -2.655449
d       NaN    NaN -1.917312

So why do we now have a column with all NaN values? It's because when you perform any operator function with a NaN then the result is a NaN

In [55]:

print(1 + np.NaN)
print(1 * np.NaN)
print(1 / np.NaN)
print(1 - np.NaN)
nan
nan
nan
nan
like image 139
EdChum Avatar answered Jun 25 '26 08:06

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!