I am relatively new to Python, and I am stuck at this point:
frame = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
index=['Utah', 'Ohio', 'Texas', 'Oregon'])
series = frame.iloc[:,0]
frame.sub(series, axis = 1,fill_value=0)
Gives this error:
C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in _ combine_match_columns(self, other, func, level, fill_value)
3470 if fill_value is not None:
3471 raise NotImplementedError("fill_value %r not supported" %
-> 3472 fill_value)
3473
3474 new_data = left._data.eval(
NotImplementedError: fill_value 0 not supported
But in the documentation of Dataframe.sub method, fill_value parameter is supported.
Can somebody explain this error?
The fill_value and related error message appears to be a red herring here. To put it differently, sub() is getting confused by the alignment here, and it just happens to be that the fill_value is where it crashes.
To see this, take out the fill value:
frame.sub(series, axis = 1)
Out[194]:
Ohio Oregon Texas Utah b d e
Utah NaN NaN NaN NaN NaN NaN NaN
Ohio NaN NaN NaN NaN NaN NaN NaN
Texas NaN NaN NaN NaN NaN NaN NaN
Oregon NaN NaN NaN NaN NaN NaN NaN
That is almost certainly not what you intended. Now if you inspect series, you'll see that it is named 'b':
series.name
Out[197]: 'b'
But pandas does not seemto automatically align a series named 'b' with the 'b' column of frame. Whether it should or not, I don't know, but the fix suggested in the comment by @AntonProtopopv allows pandas to get the alignment of column 'b' correct.
frame.sub(series.to_frame(), axis = 1)
Out[195]:
b d e
Utah 0.0 NaN NaN
Ohio 0.0 NaN NaN
Texas 0.0 NaN NaN
Oregon 0.0 NaN NaN
I'm not sure exactly what you wanted to do here, but regardless of that, if you get the alignment straightened out first, then fill_value should work as expected. Or to be honest, I'd probably just use fillna in a method chain as suggested by @NickilMaveli as that seems like the more explicit (and hence better) way to fill missing values.
One final note: if you were meaning to use numpy broadcasting here (i.e. subtract column 'b' from all columns), it is often easier to convert to arrays first, and then perform operations like subtraction.
frame.values - series.values.reshape(4,1)
Out[204]:
array([[0., 1., 2.],
[0., 1., 2.],
[0., 1., 2.],
[0., 1., 2.]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With