I have a Timeserie that spans few year, in the following format:
timestamp open high low close volume
0 2009-01-02 05:00:00 900.00 906.75 898.00 904.75 15673.0
1 2009-01-02 05:30:00 904.75 907.75 903.75 905.50 4600.0
2 2009-01-02 06:00:00 905.50 907.25 904.50 904.50 3472.0
3 2009-01-02 06:30:00 904.50 905.00 903.25 904.75 6074.0
4 2009-01-02 07:00:00 904.75 905.50 897.00 898.25 12538.0
What would be the simplest way to split that dataframe into multiple dataframes of 1 week or 1 month worth of data?77
EDIT: as an example a dataframe containing 1 year of data would be split in 52 dataframes containing a week of data and returned as a list of 52 dataframes
(the data can be reconstructed with the formula below)
import pandas as pd
from pandas import Timestamp
dikt={'close': {0: 904.75, 1: 905.5, 2: 904.5, 3: 904.75, 4: 898.25}, 'low': {0: 898.0, 1: 903.75, 2: 904.5, 3: 903.25, 4: 897.0}, 'open': {0: 900.0, 1: 904.75, 2: 905.5, 3: 904.5, 4: 904.75}, 'high': {0: 906.75, 1: 907.75, 2: 907.25, 3: 905.0, 4: 905.5}, 'volume': {0: 15673.0, 1: 4600.0, 2: 3472.0, 3: 6074.0, 4: 12538.0}, 'timestamp': {0: Timestamp('2009-01-02 05:00:00'), 1: Timestamp('2009-01-02 05:30:00'), 2: Timestamp('2009-01-02 06:00:00'), 3: Timestamp('2009-01-02 06:30:00'), 4: Timestamp('2009-01-02 07:00:00')}}
pd.DataFrame(dikt, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.
In the above example, the dataframe is groupby by the Date column. As we have provided freq = 'M' which means month, so the data is grouped month-wise till the last date of every month and provided sum of price column.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
The pd.TimeGrouper
is deprecated and will be removed, you can use pd.Grouper
instead.
weeks = [g for n, g in df.groupby(pd.Grouper(key='timestamp',freq='W'))]
months = [g for n, g in df.groupby(pd.Grouper(key='timestamp',freq='M'))]
This way you can also avoid setting the timestamp
as index.
Also, if your timestamp is part of a multi index, you can refer to it using using the level
parameter (e.g. pd.Grouper(level='timestamp', freq='W')
). Than @jtromans for the heads up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With