<p>Suppose we take a pandas dataframe...</p> <pre class="prettyprint"><code> name age family 0 john 1 1 1 jason 36 1 2 jane 32 1 3 jack 26 2 4 james 30 2 </code></pre> <p>Then do a <code>groupby()</code> ...</p> <pre class="prettyprint"><code>group_df = df.groupby('family') group_df = group_df.aggregate({'name': name_join, 'age': pd.np.mean}) </code></pre> <p>Then do some aggregate/summarize operation (in my example, my function <code>name_join</code> aggregates the names):</p> <pre class="prettyprint"><code>def name_join(list_names, concat='-'): return concat.join(list_names) </code></pre> <p>The grouped summarized output is thus:</p> <pre class="prettyprint"><code> age name family 1 23 john-jason-jane 2 28 jack-james </code></pre> <h3><strong>Question:</strong></h3> <p>Is there a quick, efficient way to get to the following from the aggregated table?</p> <pre class="prettyprint"><code> name age family 0 john 23 1 1 jason 23 1 2 jane 23 1 3 jack 28 2 4 james 28 2 </code></pre> <p>(Note: the <code>age</code> column values are just examples, I don't care for the information I am losing after averaging in this specific example)</p> <p>The way I thought I could do it does not look too efficient:</p> <ol> <li>create empty dataframe</li> <li>from every line in <code>group_df</code>, separate the names</li> <li>return a dataframe with as many rows as there are names in the starting row</li> <li>append the output to the empty dataframe</li> </ol>

<p>It turns out that <code>pd.groupby()</code> returns an object with the original data stored in <code>obj</code>. So ungrouping is just pulling out the original data.</p> <pre class="prettyprint"><code>group_df = df.groupby('family') group_df.obj </code></pre> <h3>Example</h3> <pre class="prettyprint"><code>>>> dat_1 = df.groupby("category_2") >>> dat_1 <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fce78b3dd00> >>> dat_1.obj order_date category_2 value 1 2011-02-01 Cross Country Race 324400.0 2 2011-03-01 Cross Country Race 142000.0 3 2011-04-01 Cross Country Race 498580.0 4 2011-05-01 Cross Country Race 220310.0 5 2011-06-01 Cross Country Race 364420.0 .. ... ... ... 535 2015-08-01 Triathalon 39200.0 536 2015-09-01 Triathalon 75600.0 537 2015-10-01 Triathalon 58600.0 538 2015-11-01 Triathalon 70050.0 539 2015-12-01 Triathalon 38600.0 [531 rows x 3 columns] </code></pre>

Is there an "ungroup by" operation opposite to .groupby in pandas?

Tags:

python

pandas

group-by

pandas-groupby

Suppose we take a pandas dataframe...

    name  age  family
0   john    1       1
1  jason   36       1
2   jane   32       1
3   jack   26       2
4  james   30       2

Then do a groupby() ...

group_df = df.groupby('family')
group_df = group_df.aggregate({'name': name_join, 'age': pd.np.mean})

Then do some aggregate/summarize operation (in my example, my function name_join aggregates the names):

def name_join(list_names, concat='-'):
    return concat.join(list_names)

The grouped summarized output is thus:

        age             name
family                      
1        23  john-jason-jane
2        28       jack-james

Question:

Is there a quick, efficient way to get to the following from the aggregated table?

    name  age  family
0   john   23       1
1  jason   23       1
2   jane   23       1
3   jack   28       2
4  james   28       2

(Note: the age column values are just examples, I don't care for the information I am losing after averaging in this specific example)

The way I thought I could do it does not look too efficient:

create empty dataframe
from every line in group_df, separate the names
return a dataframe with as many rows as there are names in the starting row
append the output to the empty dataframe

370

asked Nov 21 '13 13:11

mkln

3 Answers

The rough equivalent is .reset_index(), but it may not be helpful to think of it as the "opposite" of groupby().

You are splitting a string in to pieces, and maintaining each piece's association with 'family'. This old answer of mine does the job.

Just set 'family' as the index column first, refer to the link above, and then reset_index() at the end to get your desired result.

answered Nov 01 '22 18:11

Dan Allan

It turns out that pd.groupby() returns an object with the original data stored in obj. So ungrouping is just pulling out the original data.

group_df = df.groupby('family')
group_df.obj

Example

>>> dat_1 = df.groupby("category_2")
>>> dat_1
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fce78b3dd00>
>>> dat_1.obj
    order_date          category_2     value
1   2011-02-01  Cross Country Race  324400.0
2   2011-03-01  Cross Country Race  142000.0
3   2011-04-01  Cross Country Race  498580.0
4   2011-05-01  Cross Country Race  220310.0
5   2011-06-01  Cross Country Race  364420.0
..         ...                 ...       ...
535 2015-08-01          Triathalon   39200.0
536 2015-09-01          Triathalon   75600.0
537 2015-10-01          Triathalon   58600.0
538 2015-11-01          Triathalon   70050.0
539 2015-12-01          Triathalon   38600.0

[531 rows x 3 columns]

answered Nov 01 '22 16:11

Matt Dancho

Here's a complete example that recovers the original dataframe from the grouped object

def name_join(list_names, concat='-'):
    return concat.join(list_names)

print('create dataframe\n')
df = pandas.DataFrame({'name':['john', 'jason', 'jane', 'jack', 'james'], 'age':[1,36,32,26,30], 'family':[1,1,1,2,2]})
df.index.name='indexer'
print(df)
print('create group_by object')
group_obj_df = df.groupby('family')
print(group_obj_df)

print('\nrecover grouped df')
group_joined_df = group_obj_df.aggregate({'name': name_join, 'age': 'mean'})
group_joined_df


create dataframe

          name  age  family
indexer                    
0         john    1       1
1        jason   36       1
2         jane   32       1
3         jack   26       2
4        james   30       2
create group_by object
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fbfdd9dd048>

recover grouped df 
                   name  age
family                      
1       john-jason-jane   23
2            jack-james   28

print('\nRecover the original dataframe')
print(pandas.concat([group_obj_df.get_group(key) for key in group_obj_df.groups]))

Recover the original dataframe
          name  age  family
indexer                    
0         john    1       1
1        jason   36       1
2         jane   32       1
3         jack   26       2
4        james   30       2