I have a dataset:
import pandas as pd
df = pd.DataFrame({'id':[1,2,3],
'M_start_date_1':[201709,201709, 201709],
'M_end_date_1':[201905, 201905, 201905],
'M_start_date_2':[202004, 202004, 202004],
'M_end_date_2':[202005, 202005, 202005],
'F_start_date_1':[201803, 201803, 201803],
'F_end_date_1':[201904, 201904, 201904],
'F_start_date_2':[201912, 201912, 201912],
'F_end_date_2':[202007, 202007, 202007],
})
I need to tabulate it and create a new column based on prefix in columns [1:], to get this output:
I was trying to use pandas.melt function but got stuck with multiple variables. Did someone worked with this function for multiple columns or there is another way to obtain the output?
apply(pd. Series. explode) . This will explode all the columns with lists in your dataframe.
Pandas DataFrame: melt() functionThe melt() function is used to unpivot a given DataFrame from wide format to long format, optionally leaving identifier variables set. Column(s) to use as identifier variables. Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
Pandas melt () function is used to change the DataFrame format from wide to long. It’s used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value. 1. Pandas melt () Example
The values in their cells will be placed in another column called ‘Score’. The Pandas .melt () is usually the to-go-to function for transforming a wide dataframe into a long one because it’s flexible and straightforward. df.melt () takes related columns with common values and bundles them into one column called ‘variable’.
Unmelting DataFrame using pivot () function We can use pivot () function to unmelt a DataFrame object and get the original dataframe. The pivot () function ‘index’ parameter value should be same as the ‘id_vars’ value. The ‘columns’ value should be passed as the name of the ‘variable’ column.
The task: Move all the Month columns to be under one column called ‘Month’. The values in their cells will be placed in another column called ‘Score’. The Pandas .melt () is usually the to-go-to function for transforming a wide dataframe into a long one because it’s flexible and straightforward.
Main idea is convert id
column to index
, then split
all another columns by _
for MultiIndex
and DataFrame.stack
, then for correct order is used DataFrame.sort_index
, remove unnecessary levels by DataFrame.reset_index
, set index names for new columns names by DataFrame.rename_axis
and last convert it to columns:
df1 = df.set_index('id')
df1.columns = df1.columns.str.split('_', expand=True)
df1 = (df1.stack(level=[0,2,3])
.sort_index(level=[0,1], ascending=[True, False])
.reset_index(level=[2,3], drop=True)
.sort_index(axis=1, ascending=False)
.rename_axis(['id','cod'])
.reset_index())
print (df1)
id cod start end
0 1 M 201709 201905
1 1 M 202004 202005
2 1 F 201803 201904
3 1 F 201912 202007
4 2 M 201709 201905
5 2 M 202004 202005
6 2 F 201803 201904
7 2 F 201912 202007
8 3 M 201709 201905
9 3 M 202004 202005
10 3 F 201803 201904
11 3 F 201912 202007
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With