Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In pandas, how do I flatten a group of rows

I am new to pandas in python and I would be grateful for any help on this. I have been googling and googling but can't seem to crack it.

For example, I have a csv file with 6 columns. I am trying to group together the rows so that all the data for each row is flattened into one row.

So if my data looks like this:

event event_date event_time name height age 1 2015-05-06 14:00 J Bloggs 185 24 1 2015-05-06 14:00 P Smith 176 55 1 2015-05-06 14:00 T Kirk 193 22 2 2015-05-14 17:00 B Gates 178 72 2 2015-05-14 17:00 J Mayer 184 42

and what I want to end up with it flattened like this

event  event_date   event_time    name_1     height_1   age_1   name_2     height_2   age_2    name_3    height_3   age_3                                          
1      2015-05-06   14:00         J Bloggs   185        24      P Smith    176        55       T Kirk    193        22                                         
2      2015-05-14   17:00         B Gates    178        72      J Mayer    184        42
                                                                                                                                                                           .           

So as you can see above the first event in the first 3 rows have been flattened into one and the columns expanded to accomodate the row data. The second event has been flattened and the columns filled with the data.

Any help would be appreicated.

like image 732
SpeedOfSpin Avatar asked Jan 24 '17 14:01

SpeedOfSpin


2 Answers

Steps:

1) Compute the cumulative counts for the Groupby object. Add 1 so that the headers are formatted as per the desired DF.

2) Set the same grouped columns as the index axis along with the computed cumcounts and then unstack it. Additionally, sort the header according to the lowermost level.

3) Rename the multi-index columns and flatten accordingly to obtain a single header.


cc = df.groupby(['event','event_date','event_time']).cumcount() + 1
df = df.set_index(['event','event_date','event_time', cc]).unstack().sort_index(1, level=1)
df.columns = ['_'.join(map(str,i)) for i in df.columns]
df.reset_index()

enter image description here

like image 129
Nickil Maveli Avatar answered Sep 18 '22 01:09

Nickil Maveli


You making a wide table from a long one. Usually in a data analysis you would like to do the opposite. Here is a method that first counts the occurrences of each variable name, height and age and then pivots them the way you want.

df['group_num'] = df.groupby(['event', 'event_date','event_time']).cumcount() + 1
df = df.sort_values('group_num')
df1 = df.set_index(['event', 'event_date','event_time', 'group_num']).stack().reset_index()
df1['var_names'] = df1['level_4'] + '_' + df1['group_num'].astype(str)
df1 = df1.drop(['group_num', 'level_4'], axis=1)
df1.set_index(['event', 'event_date', 'event_time', 'var_names']).squeeze().unstack('var_names')

var_names                   age_1 age_2 age_3 height_1 height_2 height_3  \
event event_date event_time                                                
1     2015-05-06 14:00         24    55    22      185      176      193   
2     2015-05-14 17:00         72    42  None      178      184     None   

var_names                      name_1   name_2  name_3  
event event_date event_time                             
1     2015-05-06 14:00       J Bloggs  P Smith  T Kirk  
2     2015-05-14 17:00        B Gates  J Mayer    None  
like image 32
Ted Petrou Avatar answered Sep 19 '22 01:09

Ted Petrou