Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transform or melt column in python?

I have this tabular table, but I want to make it into a simple form. The existing form is like this:

group      mp_current   mh_current   mp_total   mh_total 
contractor       25         4825        0          0

I want to transform the table into this form:

group        mp    mh    period
contractor   25   4825   current
contractor   0      0    total

where I would have one dedicated column for mp and mh, and one extra column as the period column.

How can I perform this in python?

like image 370
yangyang Avatar asked Dec 17 '22 11:12

yangyang


2 Answers

wide_to_long

You specify the stubnames (column prefixes), the separator ('_'), and that the suffix is anything ('.*') as it by default expects numerics. The j argument becomes the column label for the the values after the separator. The column referenced by i needs to uniquely label each row.

df1 = (pd.wide_to_long(df, i='group', j='period', 
                       stubnames=['mh', 'mp'], sep='_', suffix='.*')
         .reset_index())

        group   period    mh  mp
0  contractor  current  4825  25
1  contractor    total     0   0
like image 111
ALollz Avatar answered Jan 04 '23 23:01

ALollz


As proffered in the previous solution, wide_to_long solves this quite well. An alternative is the pivot_longer function from pyjanitor (they are just wrappers of pandas functions/methods):

# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index='group', 
                names_to=('.value', 'period'), 
                names_sep='_')
 
        group   period  mp    mh
0  contractor  current  25  4825
1  contractor    total   0     0

The .value is an indicator (similar to stubnames in wide_to_long but much more flexible), informing pivot_longer to retain those parts of the column names associated with it as headers, while the others are collected into the period column. Since the column parts are separated by _, this is used in the names_sep argument. .value can be any part of the column names, as long as you can readily identify it with a regular expression (names_pattern) or a delimiter (names_sep).

You can also pull this off with the stack method, where you set an index, split the columns, stack the relevant level and finally reset the index:

df = df.set_index('group')
df.columns = df.columns.str.split('_', expand = True)
df.columns.names = [None, 'period']
df.stack(-1).reset_index()
 
        group   period    mh  mp
0  contractor  current  4825  25
1  contractor    total     0   0
like image 28
sammywemmy Avatar answered Jan 05 '23 00:01

sammywemmy