Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: mean() doesn't work when groupby aggregates dataframe to one line

I have dataframe:

time_to_rent = {'rentId': {0: 43.0, 1: 87.0, 2: 140.0, 3: 454.0, 4: 1458.0}, 'creditCardId': {0: 40, 1: 40, 2: 40, 3: 40, 4: 40}, 'createdAt': {0: Timestamp('2020-08-24 16:13:11.850216'), 1: Timestamp('2020-09-10 10:47:31.748628'), 2: Timestamp('2020-09-13 15:29:06.077622'), 3: Timestamp('2020-09-24 08:08:39.852348'), 4: Timestamp('2020-10-19 08:54:09.891518')}, 'updatedAt': {0: Timestamp('2020-08-24 20:26:31.805939'), 1: Timestamp('2020-09-10 20:05:18.759421'), 2: Timestamp('2020-09-13 18:38:10.044112'), 3: Timestamp('2020-09-24 08:53:22.512533'), 4: Timestamp('2020-10-19 17:10:09.110038')}, 'rent_time': {0: Timedelta('0 days 04:13:19.955723'), 1: Timedelta('0 days 09:17:47.010793'), 2: Timedelta('0 days 03:09:03.966490'), 3: Timedelta('0 days 00:44:42.660185'), 4: Timedelta('0 days 08:15:59.218520')}}

The idea to aggregate dataframe by column 'creditCardId' and have mean value of 'rent_time'. Ideal output should be:

creditCardId        rent_time mean
40                  0 days 05:08:10.562342

if I run code:

print (time_to_rent['rent_time'].mean())

it works fine and i have "0 days 05:08:10.562342" as output. But when i am trying to get grouping by:

time_to_rent.groupby('creditCardId', as_index=False)[['rent_time']].mean()

I got error back:

~\anaconda3\lib\site-packages\pandas\core\groupby\generic.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count)
   1093 
   1094         if not (agg_blocks or split_frames):
-> 1095             raise DataError("No numeric types to aggregate")
   1096 
   1097         if split_items:

DataError: No numeric types to aggregate

if I use the command:

time_to_rent = time_to_rent.groupby('creditCardId', as_index=False)[['rent_time']]

it returns only "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000000000B5F2EE0>"

May you please help me understand where my mistake is?

like image 807
OcMaRUS Avatar asked Oct 20 '20 15:10

OcMaRUS


People also ask

How do you find the mean in Groupby in Python?

The following is a step-by-step guide of what you need to do. Group the dataframe on the column(s) you want. Select the field(s) for which you want to estimate the mean. Apply the pandas mean() function directly or pass 'mean' to the agg() function.

How does the Groupby () method works in pandas?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

How do pandas use two aggregate functions?

To apply aggregations to multiple columns, just add additional key:value pairs to the dictionary. Applying multiple aggregation functions to a single column will result in a multiindex. Working with multi-indexed columns is a pain and I'd recommend flattening this after aggregating by renaming the new columns.


1 Answers

It's not your mistake, possibly a bug in Pandas since Timedelta can be averaged. A work-around is apply:

time_to_rent.groupby('creditCardId')['rent_time'].apply(lambda x: x.mean())

Output:

creditCardId
40   0 days 05:08:10.562342200
Name: rent_time, dtype: timedelta64[ns]
like image 50
Quang Hoang Avatar answered Oct 08 '22 09:10

Quang Hoang