Assuming the following DataFrame:
key.0 key.1 key.2 topic 1 abc def ghi 8 2 xab xcd xef 9
How can I combine the values of all the key.* columns into a single column 'key', that's associated with the topic value corresponding to the key.* columns? This is the result I want:
topic key 1 8 abc 2 8 def 3 8 ghi 4 9 xab 5 9 xcd 6 9 xef
Note that the number of key.N columns is variable on some external N.
Pandas DataFrame: stack() functionThe stack() function is used to stack the prescribed level(s) from columns to index. Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame.
Pandas provides various built-in methods for reshaping DataFrame. Among them, stack() and unstack() are the 2 most popular methods for restructuring columns and rows (also known as index). stack() : stack the prescribed level(s) from column to row. unstack() : unstack the prescribed level(s) from row to column.
You can melt your dataframe:
>>> keys = [c for c in df if c.startswith('key.')] >>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key') topic variable key 0 8 key.0 abc 1 9 key.0 xab 2 8 key.1 def 3 9 key.1 xcd 4 8 key.2 ghi 5 9 key.2 xef
It also gives you the source of the key.
From v0.20
, melt
is a first class function of the pd.DataFrame
class:
>>> df.melt('topic', value_name='key').drop('variable', 1) topic key 0 8 abc 1 9 xab 2 8 def 3 9 xcd 4 8 ghi 5 9 xef
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With