Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the average of a certain hour on weekdays over several years in a pandas dataframe

I have an hourly dataframe in the following format over several years:

Date/Time            Value
01.03.2010 00:00:00  60
01.03.2010 01:00:00  50
01.03.2010 02:00:00  52
01.03.2010 03:00:00  49
.
.
.
31.12.2013 23:00:00  77

I would like to average the data so I can get the average of hour 0, hour 1... hour 23 of each of the years.

So the output should look somehow like this:

Year Hour           Avg
2010 00              63
2010 01              55
2010 02              50
.
.
.
2013 22              71
2013 23              80

Does anyone know how to obtain this in pandas?

like image 217
Markus W Avatar asked Jun 06 '13 16:06

Markus W


People also ask

How does pandas calculate average in DataFrame?

To get column average or mean from pandas DataFrame use either mean() and describe() method. The DataFrame. mean() method is used to return the mean of the values for the requested axis.

How do you calculate pandas average time?

Try pd. to_datetime(df['timestamp'], infer_datetime_format=True). mean() .

How do I compare time in pandas?

Comparison between pandas timestamp objects is carried out using simple comparison operators: >, <,==,< = , >=. The difference can be calculated using a simple '–' operator. Given time can be converted to pandas timestamp using pandas. Timestamp() method.


1 Answers

Note: Now that Series have the dt accessor it's less important that date is the index, though Date/Time still needs to be a datetime64.

Update: You can do the groupby more directly (without the lambda):

In [21]: df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()
Out[21]:
                     Value
Date/Time Date/Time
2010      0             60
          1             50
          2             52
          3             49

In [22]: res = df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()

In [23]: res.index.names = ["year", "hour"]

In [24]: res
Out[24]:
           Value
year hour
2010 0        60
     1        50
     2        52
     3        49

If it's a datetime64 index you can do:

In [31]: df1.groupby([df1.index.year, df1.index.hour]).mean()
Out[31]:
        Value
2010 0     60
     1     50
     2     52
     3     49

Old answer (will be slower):

Assuming Date/Time was the index* you can use a mapping function in the groupby:

In [11]: year_hour_means = df1.groupby(lambda x: (x.year, x.hour)).mean()

In [12]: year_hour_means
Out[12]:
           Value
(2010, 0)     60
(2010, 1)     50
(2010, 2)     52
(2010, 3)     49

For a more useful index, you could then create a MultiIndex from the tuples:

In [13]: year_hour_means.index = pd.MultiIndex.from_tuples(year_hour_means.index,
                                                           names=['year', 'hour'])

In [14]: year_hour_means
Out[14]:
           Value
year hour
2010 0        60
     1        50
     2        52
     3        49

* if not, then first use set_index:

df1 = df.set_index('Date/Time')
like image 150
Andy Hayden Avatar answered Sep 29 '22 00:09

Andy Hayden