Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Melting columns containing tuples

Consider a pandas df with columns containing tuples of equal length.

L1 = [['ID1', ('key1a','key1b','key1c'), ('value1a','value1b','value1c')],
      ['ID2', ('key2a','key2b','key2c'), ('value2a','value2b','value2c')]]
df1 = pd.DataFrame(L1,columns=['ID','Key','Value'])

>>> df1
    ID                    Key                        Value
0  ID1  (key1a, key1b, key1c)  (value1a, value1b, value1c)
1  ID2  (key2a, key2b, key2c)  (value2a, value2b, value2c)

What's the easiest way to unfold this vertically as follows?:

    ID    Key    Value
0  ID1  key1a  value1a
1  ID1  key1b  value1b
2  ID1  key1c  value1c
3  ID2  key2a  value2a
4  ID2  key2b  value2b
5  ID2  key2c  value2c
6  ID3  key3a  value3a
7  ID3  key3b  value3b
8  ID3  key3c  value3c
like image 674
bigO6377 Avatar asked Apr 21 '16 14:04

bigO6377


People also ask

How do you split tuple columns in pandas?

To split a column of tuples in a Python Pandas data frame, we can use the column's tolist method. We create the df data frame with the pd. DataFrame class and a dictionary. Then we create a new data frame from df by using df['b'].

How do you melt a column in pandas?

Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.

What does .melt do in Python?

melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.

What is PD melt?

Pd. melt allows you to 'unpivot' data from a 'wide format' into a 'long format', perfect for my task taking 'wide format' economic data with each column representing a year, and turning it into 'long format' data with each row representing a data point.


1 Answers

quick solution

df1.set_index('ID').stack().apply(lambda x: pd.Series(x)).unstack(0).T.reset_index()
like image 67
piRSquared Avatar answered Oct 23 '22 05:10

piRSquared