How can I tidy (melt) data in Pandas and keep all other columns?

Tags:

python

pandas

Consider this Pandas dataframe:

df = pd.DataFrame({
    'User ID': [1, 2, 2, 3],
    'Cupcakes': [1, 5, 4, 2],
    'Biscuits': [2, 5, 3, 3],
    'Score': [0.65, 0.12, 0.15, 0.9]
})

ie.

   User ID  Cupcakes  Biscuits  Score
0        1         1         2   0.65
1        2         5         5   0.12
2        2         4         3   0.15
3        3         2         3   0.90

I want to tidy ("melt") this data so that the dessert type are separate observations. But I also want to keep the score for each user.

Using melt() directly doesn't work:

df.melt(
    id_vars=['User ID'],
    value_vars=['Cupcakes', 'Biscuits'],
    var_name='Dessert', value_name='Enjoyment'
)

...gives:

   User ID   Dessert  Enjoyment
0        1  Cupcakes          1
1        2  Cupcakes          5
2        2  Cupcakes          4
3        3  Cupcakes          2
4        1  Biscuits          2
5        2  Biscuits          5
6        2  Biscuits          3
7        3  Biscuits          3

I've lost the score data!

I can't use wide_to_long() because I don't have a common "stub name" for my dessert types.

I can't join or merge the tidied data with the original data because the tidied data is reindexed and the user ID is not unique for each observation.

How do I tidy this data but retain columns that aren't involved in the tidying?

747

asked Jul 24 '19 05:07

detly

1 Answers

Add column Score to id_vars in DataFrame.melt:

id_vars : tuple, list, or ndarray, optional

Column(s) to use as identifier variables.

df1 = df.melt(
    id_vars=['User ID', 'Score'],
    value_vars=['Cupcakes', 'Biscuits'],
    var_name='Dessert', value_name='Enjoyment'
)
print (df1)
   User ID  Score   Dessert  Enjoyment
0        1   0.65  Cupcakes          1
1        2   0.12  Cupcakes          5
2        2   0.15  Cupcakes          4
3        3   0.90  Cupcakes          2
4        1   0.65  Biscuits          2
5        2   0.12  Biscuits          5
6        2   0.15  Biscuits          3
7        3   0.90  Biscuits          3

If need melting all columns without User ID and Score omit value_vars:

df.melt(
    id_vars=['User ID', 'Score'],
    var_name='Dessert', value_name='Enjoyment'
)

153

answered Sep 19 '22 17:09

jezrael

Related questions
                            
                                uwsgi master graceful shutdown
                            
                                Pandas read_excel sometimes creates index even when index_col=None
                            
                                How can I fix "TypeError: cannot serialize '_io.BufferedReader' object" error when trying to multiprocess
                            
                                How to determine if numba's prange actually works correctly?
                            
                                How to increase timeout for NGINX?
                            
                                Forcing IPython to execute the current multiline code block
                            
                                Why are some Python package names different than their import name?
                            
                                Don't skip blank lines in pandas.read_excel()
                            
                                Convert raw Ipython Notebook txt to Ipynb
                            
                                GIL behavior in python 3.7 multithreading
                            
                                Pandas- ValueError: Usecols do not match columns, columns expected but not found
                            
                                Can pip (python2) and pip3 (python3) coexist?
                            
                                Multiple ranges / np.arange [duplicate]
                            
                                what is the difference between conv2d and Conv2D in Keras?
                            
                                How to speed up symbolic derivatives of long functions using SymPy?
                            
                                DataFrame object has no attribute 'name'
                            
                                Sending RabbitMq messages between Docker containers using docker-compose
                            
                                How do I alias a python module at packaging time?
                            
                                Is ray `num_cpus` used to actually allocate CPUs?
                            
                                How does .corr remove NA and null values?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With