I have dataframe:
time_to_rent = {'rentId': {0: 43.0, 1: 87.0, 2: 140.0, 3: 454.0, 4: 1458.0}, 'creditCardId': {0: 40, 1: 40, 2: 40, 3: 40, 4: 40}, 'createdAt': {0: Timestamp('2020-08-24 16:13:11.850216'), 1: Timestamp('2020-09-10 10:47:31.748628'), 2: Timestamp('2020-09-13 15:29:06.077622'), 3: Timestamp('2020-09-24 08:08:39.852348'), 4: Timestamp('2020-10-19 08:54:09.891518')}, 'updatedAt': {0: Timestamp('2020-08-24 20:26:31.805939'), 1: Timestamp('2020-09-10 20:05:18.759421'), 2: Timestamp('2020-09-13 18:38:10.044112'), 3: Timestamp('2020-09-24 08:53:22.512533'), 4: Timestamp('2020-10-19 17:10:09.110038')}, 'rent_time': {0: Timedelta('0 days 04:13:19.955723'), 1: Timedelta('0 days 09:17:47.010793'), 2: Timedelta('0 days 03:09:03.966490'), 3: Timedelta('0 days 00:44:42.660185'), 4: Timedelta('0 days 08:15:59.218520')}}
The idea to aggregate dataframe by column 'creditCardId' and have mean value of 'rent_time'. Ideal output should be:
creditCardId rent_time mean
40 0 days 05:08:10.562342
if I run code:
print (time_to_rent['rent_time'].mean())
it works fine and i have "0 days 05:08:10.562342" as output. But when i am trying to get grouping by:
time_to_rent.groupby('creditCardId', as_index=False)[['rent_time']].mean()
I got error back:
~\anaconda3\lib\site-packages\pandas\core\groupby\generic.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count)
1093
1094 if not (agg_blocks or split_frames):
-> 1095 raise DataError("No numeric types to aggregate")
1096
1097 if split_items:
DataError: No numeric types to aggregate
if I use the command:
time_to_rent = time_to_rent.groupby('creditCardId', as_index=False)[['rent_time']]
it returns only "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000000000B5F2EE0>"
May you please help me understand where my mistake is?
The following is a step-by-step guide of what you need to do. Group the dataframe on the column(s) you want. Select the field(s) for which you want to estimate the mean. Apply the pandas mean() function directly or pass 'mean' to the agg() function.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.
To apply aggregations to multiple columns, just add additional key:value pairs to the dictionary. Applying multiple aggregation functions to a single column will result in a multiindex. Working with multi-indexed columns is a pain and I'd recommend flattening this after aggregating by renaming the new columns.
It's not your mistake, possibly a bug in Pandas since Timedelta
can be averaged. A work-around is apply
:
time_to_rent.groupby('creditCardId')['rent_time'].apply(lambda x: x.mean())
Output:
creditCardId
40 0 days 05:08:10.562342200
Name: rent_time, dtype: timedelta64[ns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With