I have a multiindex dataframe with two layers of indices and roughly 100 columns. I would like to get groups of values (organized in columns) based on the presence of a certain value, but I am still struggling with the indexing mechanics.
Here is some example data:
import pandas as pd
index_arrays = [np.array(["one"]*5+["two"]*5),
np.array(["aaa","bbb","ccc","ddd","eee"]*2)]
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9],
[10,11,12],[13,14,15],[16,1,17],
[18,19,20],[21,22,23],[24,25,26],
[27,28,29]], index=index_arrays)
Gives
0 1 2
one aaa 1 2 3
bbb 4 5 6
ccc 7 8 9
ddd 10 11 12
eee 13 14 15
two aaa 16 1 17
bbb 18 19 20
ccc 21 22 23
ddd 24 25 26
eee 27 28 29
Now, for each level_0 index (one and two), I want to return the entire column in which the level_1 index of aaa equals to a certain value, for example 1.
What I got so far is this:
df[df.loc[(slice(None), "aaa"),:]==1].any(axis=1)
>
one aaa True
bbb False
ccc False
ddd False
eee False
two aaa True
bbb False
ccc False
ddd False
eee False
Instead of the boolean values, I would like to retrieve the actual values. The expected output would be:
expected:
0
one aaa 1
bbb 4
ccc 7
ddd 10
eee 13
two aaa 1
bbb 19
ccc 22
ddd 25
eee 28
I would appreciate your help.
Bonus question: Additionally, it would be great to know which column contains the values in question. For the example above, this would be column 0 (for index one)and column 1 (for index two). Is there a way to do this?
Thanks!
This might be what you're looking for:
df.loc[df.index.get_level_values(0) == 'one', df.loc[('one', 'aaa')] == 1]
This outputs:
0
one aaa 1
bbb 4
ccc 7
ddd 10
eee 13
To combine the results for all of the different values of the first index, generate these DataFrames and concatenate them:
output_df = pd.DataFrame()
for level_0_val in df.index.get_level_values(0).unique():
_ = df.loc[df.index.get_level_values(0) == level_0_val, df.loc[(level_0_val, 'aaa')] == 1]
output_df = output_df.append(_)
Here is output_df:
0 1
one aaa 1.0 NaN
bbb 4.0 NaN
ccc 7.0 NaN
ddd 10.0 NaN
eee 13.0 NaN
two aaa NaN 1.0
bbb NaN 19.0
ccc NaN 22.0
ddd NaN 25.0
eee NaN 28.0
You can then generate your desired output from this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With