Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the mean and standard deviation of a timedelta object in pandas df

I would like to calculate the mean and standard deviation of a timedelta by bank from a dataframe with two columns shown below. When I run the code (also shown below) I get the below error:

pandas.core.base.DataError: No numeric types to aggregate 

My dataframe:

   bank                          diff    Bank of Japan                 0 days 00:00:57.416000    Reserve Bank of Australia     0 days 00:00:21.452000    Reserve Bank of New Zealand  55 days 12:39:32.269000    U.S. Federal Reserve          8 days 13:27:11.387000 

My code:

means = dropped.groupby('bank').mean() std = dropped.groupby('bank').std() 
like image 406
Graham Streich Avatar asked Jun 18 '17 15:06

Graham Streich


People also ask

How do you find the mean and standard deviation of a panda?

In pandas, the std() function is used to find the standard Deviation of the series. The mean can be simply defined as the average of numbers. In pandas, the mean() function is used to find the mean of the series.

What is PD Timedelta?

Represents a duration, the difference between two dates or times. Timedelta is the pandas equivalent of python's datetime. timedelta and is interchangeable with it in most cases.

What is standard deviation in pandas?

std() The Pandas std() is defined as a function for calculating the standard deviation of the given set of numbers, DataFrame, column, and rows. In respect to calculate the standard deviation, we need to import the package named "statistics" for the calculation of median.


2 Answers

You need to convert timedelta to some numeric value, e.g. int64 by values what is most accurate, because convert to ns is what is the numeric representation of timedelta:

dropped['new'] = dropped['diff'].values.astype(np.int64)  means = dropped.groupby('bank').mean() means['new'] = pd.to_timedelta(means['new'])  std = dropped.groupby('bank').std() std['new'] = pd.to_timedelta(std['new']) 

Another solution is to convert values to seconds by total_seconds, but that is less accurate:

dropped['new'] = dropped['diff'].dt.total_seconds()  means = dropped.groupby('bank').mean() 
like image 80
jezrael Avatar answered Oct 20 '22 15:10

jezrael


Pandas mean() and other aggregation methods support numeric_only=False parameter.

dropped.groupby('bank').mean(numeric_only=False) 

Found here: Aggregations for Timedelta values in the Python DataFrame

like image 44
Alexander Usikov Avatar answered Oct 20 '22 15:10

Alexander Usikov