Getting the average of a certain hour on weekdays over several years in a pandas dataframe

Tags:

I have an hourly dataframe in the following format over several years:

Date/Time            Value
01.03.2010 00:00:00  60
01.03.2010 01:00:00  50
01.03.2010 02:00:00  52
01.03.2010 03:00:00  49
.
.
.
31.12.2013 23:00:00  77

I would like to average the data so I can get the average of hour 0, hour 1... hour 23 of each of the years.

So the output should look somehow like this:

Year Hour           Avg
2010 00              63
2010 01              55
2010 02              50
.
.
.
2013 22              71
2013 23              80

Does anyone know how to obtain this in pandas?

217

asked Jun 06 '13 16:06

Markus W

1 Answers

Note: Now that Series have the dt accessor it's less important that date is the index, though Date/Time still needs to be a datetime64.

Update: You can do the groupby more directly (without the lambda):

In [21]: df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()
Out[21]:
                     Value
Date/Time Date/Time
2010      0             60
          1             50
          2             52
          3             49

In [22]: res = df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()

In [23]: res.index.names = ["year", "hour"]

In [24]: res
Out[24]:
           Value
year hour
2010 0        60
     1        50
     2        52
     3        49

If it's a datetime64 index you can do:

In [31]: df1.groupby([df1.index.year, df1.index.hour]).mean()
Out[31]:
        Value
2010 0     60
     1     50
     2     52
     3     49

Old answer (will be slower):

Assuming Date/Time was the index* you can use a mapping function in the groupby:

In [11]: year_hour_means = df1.groupby(lambda x: (x.year, x.hour)).mean()

In [12]: year_hour_means
Out[12]:
           Value
(2010, 0)     60
(2010, 1)     50
(2010, 2)     52
(2010, 3)     49

For a more useful index, you could then create a MultiIndex from the tuples:

In [13]: year_hour_means.index = pd.MultiIndex.from_tuples(year_hour_means.index,
                                                           names=['year', 'hour'])

In [14]: year_hour_means
Out[14]:
           Value
year hour
2010 0        60
     1        50
     2        52
     3        49

* if not, then first use set_index:

df1 = df.set_index('Date/Time')

150

answered Sep 29 '22 00:09

Andy Hayden

Related questions
                            
                                Getting confused with lambda and list comprehension
                            
                                How to install gevent on Windows?
                            
                                Getting FFProbe Information With Python
                            
                                IPython import failure and python sys.path in general
                            
                                Parse HTML and preserve original content
                            
                                what language do I need to write macros in LIbre Office Calc? [closed]
                            
                                MongoDB / Pymongo Query with Datetime
                            
                                Renaming of functions with preservation of backward compatibility
                            
                                IntegrityError: distinguish between unique constraint and not null violations
                            
                                Fill and submit html form
                            
                                get bins coordinates with hexbin in matplotlib
                            
                                Tornado blocking asynchronous requests
                            
                                Sphinx generates empty documentation
                            
                                Python 3: Change default values of existing function's parameters?
                            
                                python os library source code location
                            
                                Is it good practice to depend on python's with...as statement
                            
                                How to stop a QThread from the GUI
                            
                                How to use regex in django query
                            
                                How to get columns/fields with peewee query?
                            
                                Finding the first list element for which a condition is true [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting the average of a certain hour on weekdays over several years in a pandas dataframe

Tags:

python

datetime

pandas

average

statistics

Markus W

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us