Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas melt multiple columns to tabulate a dataset

Tags:

python

pandas

I have a dataset: enter image description here

import pandas as pd
df = pd.DataFrame({'id':[1,2,3],
               'M_start_date_1':[201709,201709, 201709],
               'M_end_date_1':[201905, 201905, 201905],
               'M_start_date_2':[202004, 202004, 202004],
               'M_end_date_2':[202005, 202005, 202005],
               'F_start_date_1':[201803, 201803, 201803],
               'F_end_date_1':[201904, 201904, 201904],
               'F_start_date_2':[201912, 201912, 201912],
               'F_end_date_2':[202007, 202007, 202007],                   
               })

I need to tabulate it and create a new column based on prefix in columns [1:], to get this output: enter image description here

I was trying to use pandas.melt function but got stuck with multiple variables. Did someone worked with this function for multiple columns or there is another way to obtain the output?

like image 462
Vero Avatar asked Sep 25 '20 09:09

Vero


People also ask

How do I explode multiple columns?

apply(pd. Series. explode) . This will explode all the columns with lists in your dataframe.

How do pandas melt data frames?

Pandas DataFrame: melt() functionThe melt() function is used to unpivot a given DataFrame from wide format to long format, optionally leaving identifier variables set. Column(s) to use as identifier variables. Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.

What does melt () do?

melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.

How do you split items into multiple columns in a data frame?

We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.

What is pandas melt() function in Python?

Pandas melt () function is used to change the DataFrame format from wide to long. It’s used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value. 1. Pandas melt () Example

How to transform a pandas Dataframe into a long Dataframe?

The values in their cells will be placed in another column called ‘Score’. The Pandas .melt () is usually the to-go-to function for transforming a wide dataframe into a long one because it’s flexible and straightforward. df.melt () takes related columns with common values and bundles them into one column called ‘variable’.

How do I unmelt a Dataframe in Python?

Unmelting DataFrame using pivot () function We can use pivot () function to unmelt a DataFrame object and get the original dataframe. The pivot () function ‘index’ parameter value should be same as the ‘id_vars’ value. The ‘columns’ value should be passed as the name of the ‘variable’ column.

How to move all the month columns to one column in pandas?

The task: Move all the Month columns to be under one column called ‘Month’. The values in their cells will be placed in another column called ‘Score’. The Pandas .melt () is usually the to-go-to function for transforming a wide dataframe into a long one because it’s flexible and straightforward.


1 Answers

Main idea is convert id column to index, then split all another columns by _ for MultiIndex and DataFrame.stack, then for correct order is used DataFrame.sort_index, remove unnecessary levels by DataFrame.reset_index, set index names for new columns names by DataFrame.rename_axis and last convert it to columns:

df1 = df.set_index('id')
df1.columns = df1.columns.str.split('_', expand=True)
df1 = (df1.stack(level=[0,2,3])
          .sort_index(level=[0,1], ascending=[True, False])
          .reset_index(level=[2,3], drop=True)
          .sort_index(axis=1, ascending=False)
          .rename_axis(['id','cod'])
          .reset_index())
print (df1)
    id cod   start     end
0    1   M  201709  201905
1    1   M  202004  202005
2    1   F  201803  201904
3    1   F  201912  202007
4    2   M  201709  201905
5    2   M  202004  202005
6    2   F  201803  201904
7    2   F  201912  202007
8    3   M  201709  201905
9    3   M  202004  202005
10   3   F  201803  201904
11   3   F  201912  202007
like image 169
jezrael Avatar answered Nov 15 '22 04:11

jezrael