Here is a minimum working example of my problem:
import pandas as pd
columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)])
a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float')
b = pd.Series([13.0, 15.0])
a.loc[1,'b'] = b # this line results in NaNs
a.loc[1,'b'] = b.values # this yields correct behavior
Why is the first assignment incorrect? Both Series seem to have the same index, so I assume it should produce the correct result.
I am using pandas 0.17.0.
When you write
a.loc[1,'b'] = b
and b
is a Series, the index of b
has to exactly match the indexer generated by a.loc[1,'b']
in order for the values in b
to be copied into a
. It turns out, however, that when a.columns
is a MultiIndex
, the indexer for a.loc[1,'b']
is:
(Pdb) p new_ix
Index([(u'b', 0), (u'b', 1)], dtype='object')
whereas the index for b
is
(Pdb) p ser.index
Int64Index([0, 1], dtype='int64')
They don't match, and therefore
(Pdb) p ser.index.equals(new_ix)
False
Since the values aren't aligned, the code branch you fall into assigns
(Pdb) p ser.reindex(new_ix).values
array([ nan, nan])
I found this by adding pdb.set_trace()
to your code:
import pandas as pd
columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)])
a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float')
b = pd.Series([13.0, 15.0])
import pdb
pdb.set_trace()
a.loc[1,'b'] = b # this line results in NaNs
a.loc[1,'b'] = b.values # this yields correct behavior
and simply stepping through it at a "high level" and finding the problem occurs in
if isinstance(value, ABCSeries):
value = self._align_series(indexer, value)
and then stepping through it again (with a finer-toothed comb) with a break point starting at the line calling self._align_series(indexer, value)
.
Notice that if you change the index of b
to also be a MultiIndex:
b = pd.Series([13.0, 15.0], index=pd.MultiIndex.from_product([['b'], [0,1]]))
then
import pandas as pd
columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)])
a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float')
b = pd.Series([13.0, 15.0], index=pd.MultiIndex.from_product([['b'], [0,1]]))
a.loc[1,'b'] = b
print(a)
yields
a b c
0 1 0 1 0 1
0 0 0 0 0 0 0
1 0 0 13 15 0 0
2 0 0 0 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With