I have a dataframe:
df = pd.DataFrame([[2, 4, 7, 8, 1, 3, 2013], [9, 2, 4, 5, 5, 6, 2014]], columns=['Amy', 'Bob', 'Carl', 'Chris', 'Ben', 'Other', 'Year'])
Amy Bob Carl Chris Ben Other Year 0 2 4 7 8 1 3 2013 1 9 2 4 5 5 6 2014
And a dictionary:
d = {'A': ['Amy'], 'B': ['Bob', 'Ben'], 'C': ['Carl', 'Chris']}
I would like to reshape my dataframe to look like this:
Group Name Year Value 0 A Amy 2013 2 1 A Amy 2014 9 2 B Bob 2013 4 3 B Bob 2014 2 4 B Ben 2013 1 5 B Ben 2014 5 6 C Carl 2013 7 7 C Carl 2014 4 8 C Chris 2013 8 9 C Chris 2014 5 10 Other 2013 3 11 Other 2014 6
Note that Other
doesn't have any values in the Name
column and the order of the rows does not matter. I think I should be using the melt
function but the examples that I've come across aren't too clear.
Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.
melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.
The melt() function is used to convert a data frame with several measurement columns into a data frame in this canonical format, which has one row for every observed (measured) value.
Pandas DataFrame. transpose() is a library function that transpose index and columns. The transpose reflects the DataFrame over its main diagonal by writing rows as columns and vice-versa. Use the T attribute or the transpose() method to swap (= transpose) the rows and columns of DataFrame.
melt
gets you part way there.
In [29]: m = pd.melt(df, id_vars=['Year'], var_name='Name')
This has everything except Group
. To get that, we need to reshape d
a bit as well.
In [30]: d2 = {} In [31]: for k, v in d.items(): for item in v: d2[item] = k ....: In [32]: d2 Out[32]: {'Amy': 'A', 'Ben': 'B', 'Bob': 'B', 'Carl': 'C', 'Chris': 'C'} In [34]: m['Group'] = m['Name'].map(d2) In [35]: m Out[35]: Year Name value Group 0 2013 Amy 2 A 1 2014 Amy 9 A 2 2013 Bob 4 B 3 2014 Bob 2 B 4 2013 Carl 7 C .. ... ... ... ... 7 2014 Chris 5 C 8 2013 Ben 1 B 9 2014 Ben 5 B 10 2013 Other 3 NaN 11 2014 Other 6 NaN [12 rows x 4 columns]
And moving 'Other' from Name
to Group
In [8]: mask = m['Name'] == 'Other' In [9]: m.loc[mask, 'Name'] = '' In [10]: m.loc[mask, 'Group'] = 'Other' In [11]: m Out[11]: Year Name value Group 0 2013 Amy 2 A 1 2014 Amy 9 A 2 2013 Bob 4 B 3 2014 Bob 2 B 4 2013 Carl 7 C .. ... ... ... ... 7 2014 Chris 5 C 8 2013 Ben 1 B 9 2014 Ben 5 B 10 2013 3 Other 11 2014 6 Other [12 rows x 4 columns]
Pandas Melt Function :-
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.
eg:-
melted = pd.melt(df, id_vars=["weekday"], var_name="Person", value_name="Score")
we use melt to transform wide data to long data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With