Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Melt Function

Tags:

I have a dataframe:

df = pd.DataFrame([[2, 4, 7, 8, 1, 3, 2013], [9, 2, 4, 5, 5, 6, 2014]], columns=['Amy', 'Bob', 'Carl', 'Chris', 'Ben', 'Other', 'Year']) 
   Amy  Bob  Carl  Chris  Ben  Other  Year 0    2    4     7      8    1      3  2013 1    9    2     4      5    5      6  2014 

And a dictionary:

d = {'A': ['Amy'], 'B': ['Bob', 'Ben'], 'C': ['Carl', 'Chris']} 

I would like to reshape my dataframe to look like this:

    Group   Name  Year  Value  0      A    Amy  2013      2  1      A    Amy  2014      9  2      B    Bob  2013      4  3      B    Bob  2014      2  4      B    Ben  2013      1  5      B    Ben  2014      5  6      C   Carl  2013      7  7      C   Carl  2014      4  8      C  Chris  2013      8  9      C  Chris  2014      5 10  Other         2013      3 11  Other         2014      6 

Note that Other doesn't have any values in the Name column and the order of the rows does not matter. I think I should be using the melt function but the examples that I've come across aren't too clear.

like image 855
slaw Avatar asked Jan 16 '16 18:01

slaw


People also ask

What does melt function do in pandas?

Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.

What does melt () do?

melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.

What melted data?

The melt() function is used to convert a data frame with several measurement columns into a data frame in this canonical format, which has one row for every observed (measured) value.

How do I transpose specific rows in pandas?

Pandas DataFrame. transpose() is a library function that transpose index and columns. The transpose reflects the DataFrame over its main diagonal by writing rows as columns and vice-versa. Use the T attribute or the transpose() method to swap (= transpose) the rows and columns of DataFrame.


2 Answers

melt gets you part way there.

In [29]: m = pd.melt(df, id_vars=['Year'], var_name='Name') 

This has everything except Group. To get that, we need to reshape d a bit as well.

In [30]: d2 = {}  In [31]: for k, v in d.items():     for item in v:         d2[item] = k    ....:  In [32]: d2 Out[32]: {'Amy': 'A', 'Ben': 'B', 'Bob': 'B', 'Carl': 'C', 'Chris': 'C'}  In [34]: m['Group'] = m['Name'].map(d2)  In [35]: m Out[35]:     Year   Name  value Group 0   2013    Amy      2     A 1   2014    Amy      9     A 2   2013    Bob      4     B 3   2014    Bob      2     B 4   2013   Carl      7     C ..   ...    ...    ...   ... 7   2014  Chris      5     C 8   2013    Ben      1     B 9   2014    Ben      5     B 10  2013  Other      3   NaN 11  2014  Other      6   NaN  [12 rows x 4 columns] 

And moving 'Other' from Name to Group

In [8]: mask = m['Name'] == 'Other'  In [9]: m.loc[mask, 'Name'] = ''  In [10]: m.loc[mask, 'Group'] = 'Other'  In [11]: m Out[11]:     Year   Name  value  Group 0   2013    Amy      2      A 1   2014    Amy      9      A 2   2013    Bob      4      B 3   2014    Bob      2      B 4   2013   Carl      7      C ..   ...    ...    ...    ... 7   2014  Chris      5      C 8   2013    Ben      1      B 9   2014    Ben      5      B 10  2013             3  Other 11  2014             6  Other  [12 rows x 4 columns] 
like image 67
TomAugspurger Avatar answered Nov 09 '22 13:11

TomAugspurger


Pandas Melt Function :-

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

eg:-

melted = pd.melt(df, id_vars=["weekday"],               var_name="Person", value_name="Score") 

we use melt to transform wide data to long data.

like image 45
HeadAndTail Avatar answered Nov 09 '22 12:11

HeadAndTail