Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

take mean of data within the same day pandas

I have a dataframe df containing date of a measurement and the measurements (duration, km)

df
Out[20]: 
                          Date duration km
0   2015-03-28 09:07:00.800001    0      0
1   2015-03-28 09:36:01.819998    1      2
2   2015-03-30 09:36:06.839997    1      3
3   2015-03-30 09:37:27.659997    nan    5
4   2015-04-22 09:51:40.440003    3      7
5   2015-04-23 10:15:25.080002    0      nan

How can I calculate the average duration and km per day? I would like to take the mean of the rows using groupby and the date...

like image 946
gabboshow Avatar asked Aug 07 '17 12:08

gabboshow


People also ask

How do you calculate mean of data in pandas?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

What is the purpose of shift () method in pandas?

shift() function Shift index by desired number of periods with an optional time freq. This function takes a scalar parameter called the period, which represents the number of shifts to be made over the desired axis. This function is very helpful when dealing with time-series data.

WHAT IS mode () in pandas?

Pandas DataFrame mode() Method The mode() method returns the mode value of each column. Mean, Median, and Mode: Mean - The average value. Median - The mid point value. Mode - The most common value.


2 Answers

I think you need resample:

cols = df.columns.difference(['Date'])
#if possible convert to float
df[cols] = df[cols].astype(float)

#if astype failed, because non numeric data, convert them to NaNs
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')

#if mixed dtypes
df[cols] = df[cols].astype(str).astype(float)
#alternatively 
#df[cols] = df[cols].astype(str).apply(pd.to_numeric, errors='coerce')

df = df.resample('d', on='Date').mean().dropna(how='all')
print (df)
            duration   km
Date                     
2015-03-28       0.5  1.0
2015-03-30       1.5  4.0
2015-04-22       3.0  7.0
2015-04-23       0.0  0.0

Or:

df = df.set_index('Date').groupby(pd.Grouper(freq='d')).mean().dropna(how='all')
print (df)
            duration   km
Date                     
2015-03-28       0.5  1.0
2015-03-30       1.5  4.0
2015-04-22       3.0  7.0
2015-04-23       0.0  0.0
like image 98
jezrael Avatar answered Oct 04 '22 03:10

jezrael


Using groupby

In [896]: df.groupby(df.Date.dt.date).mean()
Out[896]:
            duration   km
Date
2015-03-28       0.5  1.0
2015-03-30       1.5  4.0
2015-04-22       3.0  7.0
2015-04-23       0.0  0.0
like image 38
Zero Avatar answered Oct 04 '22 04:10

Zero