I am new to pandas in python and I would be grateful for any help on this. I have been googling and googling but can't seem to crack it.
For example, I have a csv file with 6 columns. I am trying to group together the rows so that all the data for each row is flattened into one row.
So if my data looks like this:event event_date event_time name height age
1 2015-05-06 14:00 J Bloggs 185 24
1 2015-05-06 14:00 P Smith 176 55
1 2015-05-06 14:00 T Kirk 193 22
2 2015-05-14 17:00 B Gates 178 72
2 2015-05-14 17:00 J Mayer 184 42
and what I want to end up with it flattened like this
event event_date event_time name_1 height_1 age_1 name_2 height_2 age_2 name_3 height_3 age_3
1 2015-05-06 14:00 J Bloggs 185 24 P Smith 176 55 T Kirk 193 22
2 2015-05-14 17:00 B Gates 178 72 J Mayer 184 42
.
So as you can see above the first event in the first 3 rows have been flattened into one and the columns expanded to accomodate the row data. The second event has been flattened and the columns filled with the data.
Any help would be appreicated.
Steps:
1) Compute the cumulative counts for the Groupby object. Add 1 so that the headers are formatted as per the desired DF
.
2) Set the same grouped columns as the index axis along with the computed cumcounts
and then unstack
it. Additionally, sort the header according to the lowermost level.
3) Rename the multi-index columns and flatten accordingly to obtain a single header.
cc = df.groupby(['event','event_date','event_time']).cumcount() + 1
df = df.set_index(['event','event_date','event_time', cc]).unstack().sort_index(1, level=1)
df.columns = ['_'.join(map(str,i)) for i in df.columns]
df.reset_index()
You making a wide table from a long one. Usually in a data analysis you would like to do the opposite. Here is a method that first counts the occurrences of each variable name, height and age and then pivots them the way you want.
df['group_num'] = df.groupby(['event', 'event_date','event_time']).cumcount() + 1
df = df.sort_values('group_num')
df1 = df.set_index(['event', 'event_date','event_time', 'group_num']).stack().reset_index()
df1['var_names'] = df1['level_4'] + '_' + df1['group_num'].astype(str)
df1 = df1.drop(['group_num', 'level_4'], axis=1)
df1.set_index(['event', 'event_date', 'event_time', 'var_names']).squeeze().unstack('var_names')
var_names age_1 age_2 age_3 height_1 height_2 height_3 \
event event_date event_time
1 2015-05-06 14:00 24 55 22 185 176 193
2 2015-05-14 17:00 72 42 None 178 184 None
var_names name_1 name_2 name_3
event event_date event_time
1 2015-05-06 14:00 J Bloggs P Smith T Kirk
2 2015-05-14 17:00 B Gates J Mayer None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With