Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: DataFrame groupby for year/month and return with new DatetimeIndex

I need some directions in grouping a Pandas DateFrame object by year or month and get in return an new DateFrame object with a new index. Here is my code so far. groupby works as intended.

Load data from .csv file, parse 'Date' to date format (historical stock quotes from finance.yahoo.com)

In [23]: import pandas as pd
         file = pd.read_csv("sdf.de.csv", parse_dates=['Date'])
         file.head(2)

Out[23]:
    Date        Open    High    Low     Close   Volume  Adj Close
0   2016-02-16  18.650  18.70   17.940  18.16   1720800 17.0600
1   2016-02-15  18.295  18.64   18.065  18.50   1463500 17.3794

sort file for 'Date' ascending and set index to Date

In [24]: daily = file.sort_values(by='Date').set_index('Date')
         daily.head()

Out[24]:
            Open    High    Low     Close   Volume  Adj Close
Date                        
2000-01-03  14.20   14.50   14.15   14.40   277400  2.7916
2000-01-04  14.29   14.30   13.90   14.15   109200  2.7431

grouping for month

I would do an additional apply() to the groups, which would condense the data for the specific group, e.g. find the highest High value for the year/month or sum() the Volume values. This step is omitted for this example.

In [39]: monthly = daily.groupby(lambda x: (x.year, x.month))
         monthly.first()

Out[39]:
            Open    High    Low     Close   Volume  Adj Close
(2000, 1)   14.200  14.500  14.150  14.400  277400  2.7916
(2000, 2)   13.900  14.390  13.900  14.250  287200  2.7625
... ... ... ... ... ... ...
(2016, 1)   23.620  23.620  23.620  23.620  0       22.1893
(2016, 2)   19.575  19.630  19.140  19.450  1783000 18.2719

This works, but it gives me a DateFrame object with a tuple as index.

The desired result, in this case for grouping for month, would be a complete new DataFrame object, but the Date index should be a new DatetimeIndex in the form of %Y-%m or just %Y if grouped by year.

Out[39]:
        Open    High    Low     Close   Volume  Adj Close
Date
2000-01 14.200  14.500  14.150  14.400  277400  2.7916
2000-02 13.900  14.390  13.900  14.250  287200  2.7625
... ... ... ... ... ... ...
2016-01 23.620  23.620  23.620  23.620  0       22.1893
2016-02 19.575  19.630  19.140  19.450  1783000 18.2719

I'm thankful for any directions.

like image 703
dirk Avatar asked Feb 18 '16 17:02

dirk


2 Answers

You can use groupby with daily.index.year, daily.index.month or change index to_period and then groupby by index:

print daily
              Open   High    Low  Close   Volume  Adj Close
Date                                                       
2000-01-01  14.200  14.50  14.15  14.40   277400     2.7916
2000-02-01  13.900  14.39  13.90  14.25   287200     2.7625
2016-01-01  23.620  23.62  23.62  23.62        0    22.1893
2016-02-01  19.575  19.63  19.14  19.45  1783000    18.2719

print daily.groupby([daily.index.year, daily.index.month]).first()
          Open   High    Low  Close   Volume  Adj Close
2000 1  14.200  14.50  14.15  14.40   277400     2.7916
     2  13.900  14.39  13.90  14.25   287200     2.7625
2016 1  23.620  23.62  23.62  23.62        0    22.1893
     2  19.575  19.63  19.14  19.45  1783000    18.2719

daily.index = daily.index.to_period('M')
print daily.groupby(daily.index).first()
           Open   High    Low  Close   Volume  Adj Close
Date                                                    
2000-01  14.200  14.50  14.15  14.40   277400     2.7916
2000-02  13.900  14.39  13.90  14.25   287200     2.7625
2016-01  23.620  23.62  23.62  23.62        0    22.1893
2016-02  19.575  19.63  19.14  19.45  1783000    18.2719
like image 200
jezrael Avatar answered Sep 27 '22 15:09

jezrael


You can use a list comprehension to access the year and month accessor variable from your timestamps and then group on those.

>>> df.groupby([[d.year for d in df.Date], [d.month for d in df.Date]]).first()
             Date    Open   High    Low  Close   Volume  Adj_Close
2000 1 2000-01-01  14.200  14.50  14.15  14.40   277400     2.7916
     2 2000-02-01  13.900  14.39  13.90  14.25   287200     2.7625
2016 1 2016-01-01  23.620  23.62  23.62  23.62        0    22.1893
     2 2016-02-01  19.575  19.63  19.14  19.45  1783000    18.2719
like image 41
Alexander Avatar answered Sep 27 '22 15:09

Alexander