Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning values to Pandas Multiindex DataFrame by index level

I have a Pandas multiindex dataframe and I need to assign values to one of the columns from a series. The series shares its index with the first level of the index of the dataframe.

import pandas as pd
import numpy as np
idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo'])
idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two'])
df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B'])
s = pd.Series([True, False, True],index = np.unique(idx0))
print df
print s

out:

             A    B
bar one    NaN  NaN
    two    NaN  NaN
    three  NaN  NaN
baz one    NaN  NaN
foo one    NaN  NaN
    two    NaN  NaN

bar     True
baz    False
foo     True
dtype: bool

These don't work:

df.A = s # does not raise an error, but does nothing
df.loc[s.index,'A'] = s # raises an error

expected output:

             A     B
bar one    True   NaN
    two    True   NaN
    three  True   NaN
baz one    False  NaN
foo one    True   NaN
    two    True   NaN
like image 571
Artturi Björk Avatar asked May 08 '15 07:05

Artturi Björk


Video Answer


2 Answers

df.A = s does not raise an error, but does nothing

Indeed this should have worked.Your point is actually related to mine.

ᐊᐊ The workaround ᐊᐊ

>>> s.index = pd.Index((c,) for c in s.index)  # ᐊᐊᐊᐊᐊᐊᐊᐊ
>>> df.A = s
>>> df
               A    B
bar one     True  NaN
    two     True  NaN
    three   True  NaN
baz one    False  NaN
foo one     True  NaN
    two     True  NaN

Why does the above work ?

Because when you do directly df.A = s without the workaround, you are actually trying to assign pandas.Index-contained coordinates within a subclass instance,which somehow looks like a "counter-opposition" to the LS principle i.e. an instance of pandas.MultiIndex. I mean, look for yourself:

>>> type(s.index).__name__
'Index'

whereas

>>> type(df.index).__name__
'MultiIndex'

Hence this workaround that consists in turning s's index into a 1-dimensional pandas.MultiIndex instance.

>>> s.index = pd.Index((c,) for c in s.index)
>>> type(s.index).__name__
'MultiIndex'

and nothing has perceptibly changed

>>> s
bar     True
baz    False
foo     True
dtype: bool

A thought: From many views (mathematical, ontological) all this somehow shows that pandas.Index should have been designed as a subclass of pandas.MultiIndex, not the opposite, as it is currently.

like image 146
keepAlive Avatar answered Oct 12 '22 15:10

keepAlive


Series (and dictionaries) can be used just like functions with map and apply (thanks to @normanius for improving the syntax):

df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values

Or similarly:

df['A'] = df.reset_index(level=0)['level_0'].map(s).values

Results:

A    B
bar one     True  NaN
    two     True  NaN
    three   True  NaN
baz one    False  NaN
foo one     True  NaN
    two     True  NaN
like image 32
JohnE Avatar answered Oct 12 '22 16:10

JohnE