For a single level indexed column I would do the following
arrays = [['one', 'two', ]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(pd.np.random.randn(3, 2), index=['A', 'B', 'C'], columns=index)
print(df)
first one two
A 0.919921 -1.407321
B 1.100169 -0.927249
C -0.520308 0.619783
print(df.assign(one=lambda x: x.one * 100))
first one two
A 144.950877 0.633516
B -0.593133 -0.630641
C -5.661949 -0.738884
Now when I have a MultiIndex column I can access the desired column using .loc
but I cannot assign this to anything as it comes up with the error SyntaxError: keyword can't be an expression
.
Here is an example,
arrays = [['bar', 'bar'],
['one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(pd.np.random.randn(3, 2), index=['A', 'B', 'C'], columns=index)
print(df)
first bar
second one two
A 1.119243 0.819455
B -0.473354 -1.340502
C 0.150403 -0.211392
However,
df.assign(('bar', 'one')=lambda x: x.loc[:, ('bar', 'one')] * 10)
SyntaxError: keyword can't be an expression
I can do
df.assign(barOne=lambda x: x.loc[:, ('bar', 'one')] * 10)
first bar barOne
second one two
A 0.433909 0.949701 4.339091
B 0.011486 -1.395144 0.114858
C -0.289821 2.106951 -2.89821
but this is not desirable. I would like to keep my methods chain nicely but also keep the MultiIndexed column.
If I'm reading this correctly, would it not be as simple as:
Original df:
first bar
second one two
A 0.386729 1.014010
B 0.236824 0.439019
C 0.530020 -0.268751
Code:
df[('bar','one')] *= 10
Updated df (modify column):
first bar
second one two
A 3.8672946 1.014010
B 2.3682376 0.439019
C 5.3002040 -0.268751
Or, updated df (create new column):
df[('bar','new')] = df[('bar','one')] * 10
first bar
second one two new
A 0.386729 1.014010 3.867295
B 0.236824 0.439019 2.368238
C 0.530020 -0.268751 5.300204
Just to get more info in the same place - here's this issue raised (by you!) on GitHub and the response was:
you can simply directly index
df[('a', 1)] = ...
.assign
cannot support this syntax as its a function call, where a tuple is not a valid identifier.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With