Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Melt several groups of columns into multiple target columns by name

I would like to melt several groups of columns of a dataframe into multiple target columns. Similar to questions Python Pandas Melt Groups of Initial Columns Into Multiple Target Columns and pandas dataframe reshaping/stacking of multiple value variables into seperate columns. However I need to do this explicitly by column name, rather than by index location.

import pandas as pd
df = pd.DataFrame([('a','b','c',1,2,3,'aa','bb','cc'), ('d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
                  columns=['a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])
df

Original Dataframe:

    id   a_1  a_2  a_3  b_1  b_2  b_3  c_1  c_2  c_3
0   101   a    b    c    1    2    3    aa   bb   cc
1   102   d    e    f    4    5    6    dd   ee   ff

Target Dataframe

     id   a   b   c
0   101   a   1   aa
1   101   b   2   bb
2   101   c   3   cc
3   102   d   4   dd
4   102   e   5   ee
5   102   f   6   ff

Advice is much appreciated on an approach to this.

like image 716
Nick D Avatar asked Aug 10 '16 01:08

Nick D


People also ask

How do you split items into multiple columns in a data frame?

We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.

How do I split a column into multiple columns in list in pandas?

To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list.

How do I explode multiple columns?

apply(pd. Series. explode) . This will explode all the columns with lists in your dataframe.

Can you Groupby multiple columns in pandas?

How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.


2 Answers

There is a more efficient way to do these type of problems that involve melting multiple different sets of columns. pd.wide_to_long is built for these exact situations.

pd.wide_to_long(df, stubnames=['a', 'b', 'c'], i='id', j='dropme', sep='_')\
  .reset_index()\
  .drop('dropme', axis=1)\
  .sort_values('id')

    id  a  b   c
0  101  a  1  aa
2  101  b  2  bb
4  101  c  3  cc
1  102  d  4  dd
3  102  e  5  ee
5  102  f  6  ff
like image 120
Ted Petrou Avatar answered Sep 28 '22 11:09

Ted Petrou


You can convert the column names to multi index based on the columns pattern and then stack at a particular level depending on the result you need:

import pandas as pd
df.set_index('id', inplace=True)
df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_")))
df.stack(level = 1).reset_index(level = 1, drop = True).reset_index()

# id    a   b    c      
#101    a   1   aa
#101    b   2   bb
#101    c   3   cc
#102    d   4   dd
#102    e   5   ee
#102    f   6   ff
like image 41
Psidom Avatar answered Sep 28 '22 10:09

Psidom