Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas group hourly data into daily sums with date index

I am working on a code that takes hourly data for a month and groups it into 24 hour sums. My problem is that I would like the index to read the date/year and I am just getting an index of 1-30.

The code I am using is

df = df.iloc[:,16:27].groupby([lambda x: x.day]).sum()

example of output I am getting

DateTime     data
1            1772.031568
2            19884.42243
3            28696.72159
4            24906.20355
5            9059.120325

example of output I would like

DateTime     data
1/1/2017     1772.031568
1/2/2017     19884.42243
1/3/2017     28696.72159
1/4/2017     24906.20355
1/5/2017     9059.120325
like image 468
acb Avatar asked Mar 06 '17 16:03

acb


People also ask

How do I convert hourly data to daily data in pandas?

Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .

How do I combine a date and time column in pandas?

A Timestamp object in pandas is an equivalent of Python's datetime object. It is a combination of date and time fields. To combine date and time into a Timestamp object, we use the Timestamp. combine() function in pandas .

How do you handle dates in a dataset in pandas explain with examples?

Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.


2 Answers

This is an old question, but I don't think the accepted solution is the best in this particular case. What you want to accomplish is to down sample time series data, and Pandas has built-in functionality for this called resample(). For your example you will do:

df = df.iloc[:,16:27].resample('D').sum()

or if the datetime column is not the index

df = df.iloc[:,16:27].resample('D', on='datetime_column_name').sum()

There are (at least) 2 benefits from doing it this way as opposed to accepted answer:

  1. Resample can up sample and down sample, groupby() can only down sample
  2. No lambdas, list comprehensions or date formatting functions required.

For more information and examples, see documentation here: resample()

like image 161
Marius Avatar answered Sep 29 '22 07:09

Marius


If your index is a datetime, you can build a combined groupby clause:

df = df.iloc[:,16:27].groupby([lambda x: "{}/{}/{}".format(x.day, x.month, x.year)]).sum()

or even better:

df = df.iloc[:,16:27].groupby([lambda x: x.strftime("%d%m%Y")]).sum()
like image 30
Nicolás Ozimica Avatar answered Sep 29 '22 07:09

Nicolás Ozimica