I would like to melt several groups of columns of a dataframe into multiple target columns. Similar to questions Python Pandas Melt Groups of Initial Columns Into Multiple Target Columns and pandas dataframe reshaping/stacking of multiple value variables into seperate columns. However I need to do this explicitly by column name, rather than by index location.
import pandas as pd
df = pd.DataFrame([('a','b','c',1,2,3,'aa','bb','cc'), ('d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
columns=['a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])
df
Original Dataframe:
id a_1 a_2 a_3 b_1 b_2 b_3 c_1 c_2 c_3
0 101 a b c 1 2 3 aa bb cc
1 102 d e f 4 5 6 dd ee ff
Target Dataframe
id a b c
0 101 a 1 aa
1 101 b 2 bb
2 101 c 3 cc
3 102 d 4 dd
4 102 e 5 ee
5 102 f 6 ff
Advice is much appreciated on an approach to this.
We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.
To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list.
apply(pd. Series. explode) . This will explode all the columns with lists in your dataframe.
How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
There is a more efficient way to do these type of problems that involve melting multiple different sets of columns. pd.wide_to_long
is built for these exact situations.
pd.wide_to_long(df, stubnames=['a', 'b', 'c'], i='id', j='dropme', sep='_')\
.reset_index()\
.drop('dropme', axis=1)\
.sort_values('id')
id a b c
0 101 a 1 aa
2 101 b 2 bb
4 101 c 3 cc
1 102 d 4 dd
3 102 e 5 ee
5 102 f 6 ff
You can convert the column names to multi index based on the columns pattern and then stack at a particular level depending on the result you need:
import pandas as pd
df.set_index('id', inplace=True)
df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_")))
df.stack(level = 1).reset_index(level = 1, drop = True).reset_index()
# id a b c
#101 a 1 aa
#101 b 2 bb
#101 c 3 cc
#102 d 4 dd
#102 e 5 ee
#102 f 6 ff
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With