I'm trying to reshape my data. At first glance, it sounds like a transpose, but it's not. I tried melts, stack/unstack, joins, etc.
Use Case
I want to have only one row per unique individual, and put all job history on the columns. For clients, it can be easier to read information across rows rather than reading through columns.
Here's the data:
import pandas as pd
import numpy as np
data1 = {'Name': ["Joe", "Joe", "Joe","Jane","Jane"],
'Job': ["Analyst","Manager","Director","Analyst","Manager"],
'Job Eff Date': ["1/1/2015","1/1/2016","7/1/2016","1/1/2015","1/1/2016"]}
df2 = pd.DataFrame(data1, columns=['Name', 'Job', 'Job Eff Date'])
df2
Here's what I want it to look like: Desired Output Table
DataFrame - transpose() functionThe transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.
You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...])
Use the T attribute or the transpose() method to swap (= transpose) the rows and columns of pandas. DataFrame . Neither method changes the original object but returns a new object with the rows and columns swapped (= transposed object).
Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.
.T
within groupby
def tgrp(df):
df = df.drop('Name', axis=1)
return df.reset_index(drop=True).T
df2.groupby('Name').apply(tgrp).unstack()
groupby
returns an object that contains information on how the original series or dataframe has been grouped. Instead of performing a groupby
with a subsquent action of some sort, we could first assign the df2.groupby('Name')
to a variable (I often do), say gb
.
gb = df2.groupby('Name')
On this object gb
we could call .mean()
to get an average of each group. Or .last()
to get the last element (row) of each group. Or .transform(lambda x: (x - x.mean()) / x.std())
to get a zscore transformation within each group. When there is something you want to do within a group that doesn't have a predefined function, there is still .apply()
.
.apply()
for a groupby
object is different than it is for a dataframe
. For a dataframe, .apply()
takes callable object as its argument and applies that callable to each column (or row) in the object. the object that is passed to that callable is a pd.Series
. When you are using .apply
in a dataframe
context, it is helpful to keep this fact in mind. In the context of a groupby
object, the object passed to the callable argument is a dataframe. In fact, that dataframe is one of the groups specified by the groupby
.
When I write such functions to pass to groupby.apply
, I typically define the parameter as df
to reflect that it is a dataframe.
Ok, so we have:
df2.groupby('Name').apply(tgrp)
This generates a sub-dataframe for each 'Name'
and passes that sub-dataframe to the function tgrp
. Then the groupby
object recombines all such groups having gone through the tgrp
function back together again.
It'll look like this.
I took the OP's original attempt to simply transpose to heart. But I had to do some things first. Had I simply done:
df2[df2.Name == 'Jane'].T
df2[df2.Name == 'Joe'].T
Combining these manually (without groupby
):
pd.concat([df2[df2.Name == 'Jane'].T, df2[df2.Name == 'Joe'].T])
Whoa! Now that's ugly. Obviously the index values of [0, 1, 2]
don't mesh with [3, 4]
. So let's reset.
pd.concat([df2[df2.Name == 'Jane'].reset_index(drop=True).T,
df2[df2.Name == 'Joe'].reset_index(drop=True).T])
That's much better. But now we are getting into the territory groupby
was intended to handle. So let it handle it.
Back to
df2.groupby('Name').apply(tgrp)
The only thing missing here is that we want to unstack the results to get the desired output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With