Consider a pandas df with columns containing tuples of equal length.
L1 = [['ID1', ('key1a','key1b','key1c'), ('value1a','value1b','value1c')],
['ID2', ('key2a','key2b','key2c'), ('value2a','value2b','value2c')]]
df1 = pd.DataFrame(L1,columns=['ID','Key','Value'])
>>> df1
ID Key Value
0 ID1 (key1a, key1b, key1c) (value1a, value1b, value1c)
1 ID2 (key2a, key2b, key2c) (value2a, value2b, value2c)
What's the easiest way to unfold this vertically as follows?:
ID Key Value
0 ID1 key1a value1a
1 ID1 key1b value1b
2 ID1 key1c value1c
3 ID2 key2a value2a
4 ID2 key2b value2b
5 ID2 key2c value2c
6 ID3 key3a value3a
7 ID3 key3b value3b
8 ID3 key3c value3c
To split a column of tuples in a Python Pandas data frame, we can use the column's tolist method. We create the df data frame with the pd. DataFrame class and a dictionary. Then we create a new data frame from df by using df['b'].
Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.
melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.
Pd. melt allows you to 'unpivot' data from a 'wide format' into a 'long format', perfect for my task taking 'wide format' economic data with each column representing a year, and turning it into 'long format' data with each row representing a data point.
quick solution
df1.set_index('ID').stack().apply(lambda x: pd.Series(x)).unstack(0).T.reset_index()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With