This is coming straight from the tutorial, which I can't understand even after reading the doc.
In [14]: df = DataFrame({'one' : Series(randn(3), index=['a', 'b', 'c']),
....: 'two' : Series(randn(4), index=['a', 'b', 'c', 'd']),
....: 'three' : Series(randn(3), index=['b', 'c', 'd'])})
....:
In [15]: df
Out[15]:
one three two
a -0.626544 NaN -0.351587
b -0.138894 -0.177289 1.136249
c 0.011617 0.462215 -0.448789
d NaN 1.124472 -1.101558
In [16]: row = df.ix[1]
In [17]: column = df['two']
In [18]: df.sub(row, axis='columns')
Out[18]:
one three two
a -0.487650 NaN -1.487837
b 0.000000 0.000000 0.000000
c 0.150512 0.639504 -1.585038
d NaN 1.301762 -2.237808
Why does the second row turn into 0? Is it being sub-stituted with 0?
Also, when I use row = df.ix[0], the entire second column turns into NaN. Why?
sub means subtract, so lets walk through this:
In [44]:
# create some data
df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
df
Out[44]:
one three two
a -1.536737 NaN 1.537104
b 1.486947 -0.429089 -0.227643
c 0.219609 -0.178037 -1.118345
d NaN 1.254126 -0.380208
In [45]:
# take a copy of 2nd row
row = df.ix[1]
row
Out[45]:
one 1.486947
three -0.429089
two -0.227643
Name: b, dtype: float64
In [46]:
# now subtract the 2nd row row-wise
df.sub(row, axis='columns')
Out[46]:
one three two
a -3.023684 NaN 1.764747
b 0.000000 0.000000 0.000000
c -1.267338 0.251052 -0.890702
d NaN 1.683215 -0.152565
So probably what is confusing you is what is happening when you specified 'columns' as the axis to operate on. We've subtracted from each row the value of the 2nd row, this explains why the 2nd row has now become all 0's. The data you've passed is a series and we're aligning on column's so in effect we're aligning against the column names which is why it's performed row-wise
In [47]:
# now take a copy of the first row
row = df.ix[0]
row
Out[47]:
one -1.536737
three NaN
two 1.537104
Name: a, dtype: float64
In [48]:
# perform the same op
df.sub(row, axis='columns')
Out[48]:
one three two
a 0.000000 NaN 0.000000
b 3.023684 NaN -1.764747
c 1.756346 NaN -2.655449
d NaN NaN -1.917312
So why do we now have a column with all NaN values? It's because when you perform any operator function with a NaN then the result is a NaN
In [55]:
print(1 + np.NaN)
print(1 * np.NaN)
print(1 / np.NaN)
print(1 - np.NaN)
nan
nan
nan
nan
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With