Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I tidy (melt) data in Pandas and keep all other columns?

Tags:

python

pandas

Consider this Pandas dataframe:

df = pd.DataFrame({
    'User ID': [1, 2, 2, 3],
    'Cupcakes': [1, 5, 4, 2],
    'Biscuits': [2, 5, 3, 3],
    'Score': [0.65, 0.12, 0.15, 0.9]
})

ie.

   User ID  Cupcakes  Biscuits  Score
0        1         1         2   0.65
1        2         5         5   0.12
2        2         4         3   0.15
3        3         2         3   0.90

I want to tidy ("melt") this data so that the dessert type are separate observations. But I also want to keep the score for each user.

Using melt() directly doesn't work:

df.melt(
    id_vars=['User ID'],
    value_vars=['Cupcakes', 'Biscuits'],
    var_name='Dessert', value_name='Enjoyment'
)

...gives:

   User ID   Dessert  Enjoyment
0        1  Cupcakes          1
1        2  Cupcakes          5
2        2  Cupcakes          4
3        3  Cupcakes          2
4        1  Biscuits          2
5        2  Biscuits          5
6        2  Biscuits          3
7        3  Biscuits          3

I've lost the score data!

I can't use wide_to_long() because I don't have a common "stub name" for my dessert types.

I can't join or merge the tidied data with the original data because the tidied data is reindexed and the user ID is not unique for each observation.

How do I tidy this data but retain columns that aren't involved in the tidying?

like image 747
detly Avatar asked Jul 24 '19 05:07

detly


People also ask

How do you melt a column in pandas?

Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.

What does pandas melt () do?

Pandas.melt() unpivots a DataFrame from wide format to long format. melt() function is useful to massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.

How do you melt a Dataframe in pandas?

Pandas.melt () melt () is used to convert a wide dataframe into a longer form. This function can be used when there are requirements to consider a specific column as an identifier. Syntax: pandas.melt (frame, id_vars=None, value_vars=None, var_name=None, value_name=’value’, col_level=None)

How do I keep the most recent record in pandas?

We can also see that they are only duplicate across two of the columns and that one of the records is more recent. We can modify the behavior of the method to keep the most recent record by first sorting the data based on the last modified date. Then, we can ask Pandas to drop based on a subset of relevant columns. Let’s see what this looks like:

How do I UNPIVOT a column in a pandas Dataframe?

Using melt () function to print all the unpivot column values. In the above program, we first import the pandas library as pd, and then we define the dataframe. Once the dataframe is defined, we use the melt () function to unpivot all the column values and print them in the output.


1 Answers

Add column Score to id_vars in DataFrame.melt:

id_vars : tuple, list, or ndarray, optional

Column(s) to use as identifier variables.

df1 = df.melt(
    id_vars=['User ID', 'Score'],
    value_vars=['Cupcakes', 'Biscuits'],
    var_name='Dessert', value_name='Enjoyment'
)
print (df1)
   User ID  Score   Dessert  Enjoyment
0        1   0.65  Cupcakes          1
1        2   0.12  Cupcakes          5
2        2   0.15  Cupcakes          4
3        3   0.90  Cupcakes          2
4        1   0.65  Biscuits          2
5        2   0.12  Biscuits          5
6        2   0.15  Biscuits          3
7        3   0.90  Biscuits          3

If need melting all columns without User ID and Score omit value_vars:

df.melt(
    id_vars=['User ID', 'Score'],
    var_name='Dessert', value_name='Enjoyment'
)
like image 153
jezrael Avatar answered Sep 19 '22 17:09

jezrael