Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split columns into MultiIndex with missing columns in pandas

This is similar to the problem I asked here. However, I found out that the data I am working is not always consistent. For, example say :

import pandas as pd

df = pd.DataFrame(pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]],columns=["X_a","Y_c","X_b","Y_a"]))

   X_a  Y_c  X_b  Y_a
0    1    2    3    4
1    5    6    7    8
2    9   10   11   12

Now you can see that X does not have corresponding c column and Y does not have corresponding b column. Now when I want to create the multi-level index, I want the dataframe to look like this:

     X             Y
     a    b   c    a    b   c
0    1    3   -1   4   -1   2
1    5    7   -1   8   -1   6
2    9   11   -1  12   -1  10

So as you can see, I want the split in such a way that all upper level columns should have the same lower level columns. Since, the dataset is positve, I am thinking of filling the missing columns with -1, although I am open for suggestions on this. The closest thing I found to my problem was this answer. However, I cannot make it to somehow work with MultiLevel Index like in my previous question. Any help is appreciated.

like image 695
Gambit1614 Avatar asked Sep 16 '17 06:09

Gambit1614


People also ask

How do you slice in MultiIndex?

Using slicers You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.

How do I split a column into multiple columns in pandas?

In Pandas, the apply() method can also be used to split one column values into multiple columns. The DataFrame. apply method() can execute a function on all values of single or multiple columns. Then inside that function, we can split the string value to multiple values.

How do I create a MultiIndex column in pandas?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.

How do I drop a column with multiple indexes?

Practical Data Science using Python To drop multiple levels from a multi-level column index, use the columns. droplevel() repeatedly. We have used the Multiindex. from_tuples() is used to create indexes column-wise.


1 Answers

Create a MultiIndex and set df.columns.

idx = df.columns.str.split('_', expand=True)
idx
MultiIndex(levels=[['X', 'Y'], ['a', 'b', 'c']],
           labels=[[0, 1, 0, 1], [0, 2, 1, 0]])

df.columns = idx

Now, with the existing MultiIndex, create a new index and use that to reindex the original.

idx = pd.MultiIndex.from_product([idx.levels[0], idx.levels[1]])
idx
MultiIndex(levels=[['X', 'Y'], ['a', 'b', 'c']],
       labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])

df.reindex(columns=idx, fill_value=-1)
   X          Y       
   a   b  c   a  b   c
0  1   3 -1   4 -1   2
1  5   7 -1   8 -1   6
2  9  11 -1  12 -1  10
like image 92
cs95 Avatar answered Oct 11 '22 05:10

cs95