This is similar to the problem I asked here. However, I found out that the data I am working is not always consistent. For, example say :
import pandas as pd
df = pd.DataFrame(pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]],columns=["X_a","Y_c","X_b","Y_a"]))
X_a Y_c X_b Y_a
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
Now you can see that X
does not have corresponding c
column and Y
does not have corresponding b
column. Now when I want to create the multi-level index, I want the dataframe to look like this:
X Y
a b c a b c
0 1 3 -1 4 -1 2
1 5 7 -1 8 -1 6
2 9 11 -1 12 -1 10
So as you can see, I want the split in such a way that all upper level columns should have the same lower level columns. Since, the dataset is positve, I am thinking of filling the missing columns with -1, although I am open for suggestions on this. The closest thing I found to my problem was this answer. However, I cannot make it to somehow work with MultiLevel Index like in my previous question. Any help is appreciated.
Using slicers You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.
In Pandas, the apply() method can also be used to split one column values into multiple columns. The DataFrame. apply method() can execute a function on all values of single or multiple columns. Then inside that function, we can split the string value to multiple values.
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
Practical Data Science using Python To drop multiple levels from a multi-level column index, use the columns. droplevel() repeatedly. We have used the Multiindex. from_tuples() is used to create indexes column-wise.
Create a MultiIndex
and set df.columns
.
idx = df.columns.str.split('_', expand=True)
idx
MultiIndex(levels=[['X', 'Y'], ['a', 'b', 'c']],
labels=[[0, 1, 0, 1], [0, 2, 1, 0]])
df.columns = idx
Now, with the existing MultiIndex
, create a new index and use that to reindex
the original.
idx = pd.MultiIndex.from_product([idx.levels[0], idx.levels[1]])
idx
MultiIndex(levels=[['X', 'Y'], ['a', 'b', 'c']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
df.reindex(columns=idx, fill_value=-1)
X Y
a b c a b c
0 1 3 -1 4 -1 2
1 5 7 -1 8 -1 6
2 9 11 -1 12 -1 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With