Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How use the Pandas .assign() method chain on a MultiIndex Column?

For a single level indexed column I would do the following

arrays = [['one', 'two', ]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(pd.np.random.randn(3, 2), index=['A', 'B', 'C'], columns=index)
print(df)


first   one two
A   0.919921    -1.407321
B   1.100169    -0.927249
C   -0.520308   0.619783

print(df.assign(one=lambda x: x.one * 100))

first   one         two
A       144.950877  0.633516
B       -0.593133   -0.630641
C       -5.661949   -0.738884

Now when I have a MultiIndex column I can access the desired column using .loc but I cannot assign this to anything as it comes up with the error SyntaxError: keyword can't be an expression.

Here is an example,

arrays = [['bar', 'bar'],
          ['one', 'two']]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(pd.np.random.randn(3, 2), index=['A', 'B', 'C'], columns=index)

print(df)

first   bar
second  one         two
A       1.119243    0.819455
B       -0.473354   -1.340502
C       0.150403    -0.211392

However,

df.assign(('bar', 'one')=lambda x: x.loc[:, ('bar', 'one')] * 10)

SyntaxError: keyword can't be an expression

I can do

df.assign(barOne=lambda x: x.loc[:, ('bar', 'one')] * 10)


first   bar                     barOne
second  one         two 
A       0.433909    0.949701    4.339091
B       0.011486    -1.395144   0.114858
C       -0.289821   2.106951    -2.89821

but this is not desirable. I would like to keep my methods chain nicely but also keep the MultiIndexed column.

like image 767
Little Bobby Tables Avatar asked Oct 30 '22 07:10

Little Bobby Tables


2 Answers

If I'm reading this correctly, would it not be as simple as:

Original df:

first        bar
second       one       two
A       0.386729  1.014010
B       0.236824  0.439019
C       0.530020 -0.268751

Code:

df[('bar','one')] *= 10

Updated df (modify column):

first         bar
second        one       two
A       3.8672946  1.014010
B       2.3682376  0.439019
C       5.3002040 -0.268751

Or, updated df (create new column):

df[('bar','new')] = df[('bar','one')] * 10

first        bar
second       one       two       new
A       0.386729  1.014010  3.867295
B       0.236824  0.439019  2.368238
C       0.530020 -0.268751  5.300204
like image 61
elPastor Avatar answered Nov 14 '22 18:11

elPastor


Just to get more info in the same place - here's this issue raised (by you!) on GitHub and the response was:

you can simply directly index

df[('a', 1)] = ...

.assign cannot support this syntax as its a function call, where a tuple is not a valid identifier.

like image 40
MokeEire Avatar answered Nov 14 '22 19:11

MokeEire