I'm trying to reshape my data. At first glance, it sounds like a transpose, but it's not. I tried melts, stack/unstack, joins, etc. Use Case I want to have only one row per unique individual, and put all job history on the columns. For clients, it can be easier to read information across rows rather than reading through columns. Here's the data: <pre class="prettyprint"><code>import pandas as pd import numpy as np data1 = {'Name': ["Joe", "Joe", "Joe","Jane","Jane"], 'Job': ["Analyst","Manager","Director","Analyst","Manager"], 'Job Eff Date': ["1/1/2015","1/1/2016","7/1/2016","1/1/2015","1/1/2016"]} df2 = pd.DataFrame(data1, columns=['Name', 'Job', 'Job Eff Date']) df2 </code></pre> Here's what I want it to look like: Desired Output Table <img src="https://i.stack.imgur.com/qmCpb.png" alt="enter image description here">

<code>.T</code> within <code>groupby</code> <pre class="prettyprint"><code>def tgrp(df): df = df.drop('Name', axis=1) return df.reset_index(drop=True).T df2.groupby('Name').apply(tgrp).unstack() </code></pre> <img src="https://i.stack.imgur.com/4b3Nx.png" alt="enter image description here"> <hr> <h3>Explanation</h3> <code>groupby</code> returns an object that contains information on how the original series or dataframe has been grouped. Instead of performing a <code>groupby</code> with a subsquent action of some sort, we could first assign the <code>df2.groupby('Name')</code> to a variable (I often do), say <code>gb</code>. <pre class="prettyprint"><code>gb = df2.groupby('Name') </code></pre> On this object <code>gb</code> we could call <code>.mean()</code> to get an average of each group. Or <code>.last()</code> to get the last element (row) of each group. Or <code>.transform(lambda x: (x - x.mean()) / x.std())</code> to get a zscore transformation within each group. When there is something you want to do within a group that doesn't have a predefined function, there is still <code>.apply()</code>. <code>.apply()</code> for a <code>groupby</code> object is different than it is for a <code>dataframe</code>. For a dataframe, <code>.apply()</code> takes callable object as its argument and applies that callable to each column (or row) in the object. the object that is passed to that callable is a <code>pd.Series</code>. When you are using <code>.apply</code> in a <code>dataframe</code> context, it is helpful to keep this fact in mind. In the context of a <code>groupby</code> object, the object passed to the callable argument is a dataframe. In fact, that dataframe is one of the groups specified by the <code>groupby</code>. When I write such functions to pass to <code>groupby.apply</code>, I typically define the parameter as <code>df</code> to reflect that it is a dataframe. Ok, so we have: <pre class="prettyprint"><code>df2.groupby('Name').apply(tgrp) </code></pre> This generates a sub-dataframe for each <code>'Name'</code> and passes that sub-dataframe to the function <code>tgrp</code>. Then the <code>groupby</code> object recombines all such groups having gone through the <code>tgrp</code> function back together again. It'll look like this. <img src="https://i.stack.imgur.com/k86Bd.png" alt="enter image description here"> I took the OP's original attempt to simply transpose to heart. But I had to do some things first. Had I simply done: <pre class="prettyprint"><code>df2[df2.Name == 'Jane'].T </code></pre> <img src="https://i.stack.imgur.com/TkmnY.png" alt="enter image description here"> <pre class="prettyprint"><code>df2[df2.Name == 'Joe'].T </code></pre> <img src="https://i.stack.imgur.com/aWDLD.png" alt="enter image description here"> Combining these manually (without <code>groupby</code>): <pre class="prettyprint"><code>pd.concat([df2[df2.Name == 'Jane'].T, df2[df2.Name == 'Joe'].T]) </code></pre> <img src="https://i.stack.imgur.com/ra879.png" alt="enter image description here"> Whoa! Now that's ugly. Obviously the index values of <code>[0, 1, 2]</code> don't mesh with <code>[3, 4]</code>. So let's reset. <pre class="prettyprint"><code>pd.concat([df2[df2.Name == 'Jane'].reset_index(drop=True).T, df2[df2.Name == 'Joe'].reset_index(drop=True).T]) </code></pre> <img src="https://i.stack.imgur.com/NXJN4.png" alt="enter image description here"> That's much better. But now we are getting into the territory <code>groupby</code> was intended to handle. So let it handle it. Back to <pre class="prettyprint"><code>df2.groupby('Name').apply(tgrp) </code></pre> The only thing missing here is that we want to unstack the results to get the desired output. <img src="https://i.stack.imgur.com/k86Bd.png" alt="enter image description here">

Reshape pandas dataframe from rows to columns

Tags:

python

pandas

dataframe

pandas-groupby

reshape

I'm trying to reshape my data. At first glance, it sounds like a transpose, but it's not. I tried melts, stack/unstack, joins, etc.

Use Case

I want to have only one row per unique individual, and put all job history on the columns. For clients, it can be easier to read information across rows rather than reading through columns.

Here's the data:

import pandas as pd
import numpy as np

data1 = {'Name': ["Joe", "Joe", "Joe","Jane","Jane"],
        'Job': ["Analyst","Manager","Director","Analyst","Manager"],
        'Job Eff Date': ["1/1/2015","1/1/2016","7/1/2016","1/1/2015","1/1/2016"]}
df2 = pd.DataFrame(data1, columns=['Name', 'Job', 'Job Eff Date'])

df2

Here's what I want it to look like: Desired Output Table

enter image description here

252

asked Jul 31 '16 07:07

Christopher

1 Answers

.T within groupby

def tgrp(df):
    df = df.drop('Name', axis=1)
    return df.reset_index(drop=True).T

df2.groupby('Name').apply(tgrp).unstack()

enter image description here

Explanation

groupby returns an object that contains information on how the original series or dataframe has been grouped. Instead of performing a groupby with a subsquent action of some sort, we could first assign the df2.groupby('Name') to a variable (I often do), say gb.

gb = df2.groupby('Name')

On this object gb we could call .mean() to get an average of each group. Or .last() to get the last element (row) of each group. Or .transform(lambda x: (x - x.mean()) / x.std()) to get a zscore transformation within each group. When there is something you want to do within a group that doesn't have a predefined function, there is still .apply().

.apply() for a groupby object is different than it is for a dataframe. For a dataframe, .apply() takes callable object as its argument and applies that callable to each column (or row) in the object. the object that is passed to that callable is a pd.Series. When you are using .apply in a dataframe context, it is helpful to keep this fact in mind. In the context of a groupby object, the object passed to the callable argument is a dataframe. In fact, that dataframe is one of the groups specified by the groupby.

When I write such functions to pass to groupby.apply, I typically define the parameter as df to reflect that it is a dataframe.

Ok, so we have:

df2.groupby('Name').apply(tgrp)

This generates a sub-dataframe for each 'Name' and passes that sub-dataframe to the function tgrp. Then the groupby object recombines all such groups having gone through the tgrp function back together again.

It'll look like this.

enter image description here

I took the OP's original attempt to simply transpose to heart. But I had to do some things first. Had I simply done:

df2[df2.Name == 'Jane'].T

enter image description here

df2[df2.Name == 'Joe'].T

enter image description here

Combining these manually (without groupby):

pd.concat([df2[df2.Name == 'Jane'].T, df2[df2.Name == 'Joe'].T])

enter image description here

Whoa! Now that's ugly. Obviously the index values of [0, 1, 2] don't mesh with [3, 4]. So let's reset.

pd.concat([df2[df2.Name == 'Jane'].reset_index(drop=True).T,
           df2[df2.Name == 'Joe'].reset_index(drop=True).T])

enter image description here

That's much better. But now we are getting into the territory groupby was intended to handle. So let it handle it.

Back to

df2.groupby('Name').apply(tgrp)

The only thing missing here is that we want to unstack the results to get the desired output.

enter image description here

157

answered Oct 24 '22 06:10

piRSquared

Related questions
                            
                                How to test send_mail in Django?
                            
                                Drop rows with a 'question mark' value in any column in a pandas dataframe
                            
                                IncompatibleProtocolError while trying to connect to RabbitMQ
                            
                                How to manually install python-dev from source
                            
                                nltk regular expression tokenizer
                            
                                Sklearn PCA is pca.components_ the loadings?
                            
                                Remove Header and Footer from Pandas Dataframe print
                            
                                python regex add space whenever a number is adjacent to a non-number
                            
                                Split the result of 'counter'
                            
                                Difference between LinkExtractor and SgmlLinkExtractor
                            
                                Show hidden option using argparse
                            
                                Linear regression with pandas time series
                            
                                Is it possible to read field names from a compound Dataset in an HDF5 file in Python?
                            
                                Select array element from Spark Dataframes split method in same call?
                            
                                How to show/hide widgets in Tkinter?
                            
                                Set Union in pandas
                            
                                How to turn a system of sympy equations into matrix form
                            
                                How do I unit test a method that sets internal data, but doesn't return?
                            
                                Regular Expression to accept all Thai characters and English letters in python
                            
                                Python/Django database username and password?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reshape pandas dataframe from rows to columns

Tags:

python

pandas

dataframe

pandas-groupby

reshape

Christopher

People also ask

1 Answers

Explanation

piRSquared

Recent Activity

Donate For Us