I've got a DataFrame
storing daily-based data which is as below:
Date Open High Low Close Volume 2010-01-04 38.660000 39.299999 38.509998 39.279999 1293400 2010-01-05 39.389999 39.520000 39.029999 39.430000 1261400 2010-01-06 39.549999 40.700001 39.020000 40.250000 1879800 2010-01-07 40.090000 40.349998 39.910000 40.090000 836400 2010-01-08 40.139999 40.310001 39.720001 40.290001 654600 2010-01-11 40.209999 40.520000 40.040001 40.290001 963600 2010-01-12 40.160000 40.340000 39.279999 39.980000 1012800 2010-01-13 39.930000 40.669998 39.709999 40.560001 1773400 2010-01-14 40.490002 40.970001 40.189999 40.520000 1240600 2010-01-15 40.570000 40.939999 40.099998 40.450001 1244200
What I intend to do is to merge it into weekly-based data. After grouping:
which should look like this:
Date Open High Low Close Volume 2010-01-04 38.660000 40.700001 38.509998 40.290001 5925600 2010-01-11 40.209999 40.970001 39.279999 40.450001 6234600
Currently, my code snippet is as below, which function should I use to mapping daily-based data to the expected weekly-based data? Many thanks!
import pandas_datareader.data as web start = datetime.datetime(2010, 1, 1) end = datetime.datetime(2016, 12, 31) f = web.DataReader("MNST", "yahoo", start, end, session=session) print f
Method 1: using Python for-loops. Function new_case_count() takes in DataFrame object, iterates over it and converts indexes, which are dates in string format, to Pandas Datetime format. Based on the date's day of the week, each week's new cases count is calculated and stored in a list.
Click a cell in the date column of the pivot table that Excel created in the spreadsheet. Right-click and select "Group," then "Days." Enter "7" in the "Number of days" box to group by week. Click "OK" and verify that you have correctly converted daily data to weekly data.
The day of the week with Monday=0, Sunday=6. Return the day of the week. It is assumed the week starts on Monday, which is denoted by 0 and ends on Sunday which is denoted by 6. This method is available on both Series with datetime values (using the dt accessor) or DatetimeIndex.
You can resample
(to weekly), offset
(shift), and apply
aggregation rules as follows:
logic = {'Open' : 'first', 'High' : 'max', 'Low' : 'min', 'Close' : 'last', 'Volume': 'sum'} offset = pd.offsets.timedelta(days=-6) f = pd.read_clipboard(parse_dates=['Date'], index_col=['Date']) f.resample('W', loffset=offset).apply(logic)
to get:
Open High Low Close Volume Date 2010-01-04 38.660000 40.700001 38.509998 40.290001 5925600 2010-01-11 40.209999 40.970001 39.279999 40.450001 6234600
In general, assuming that you have the dataframe in the form you specified, you need to do the following steps:
Date
in the indexresample
the index. What you have is a case of applying different functions to different columns. See.
You can resample in various ways. for e.g. you can take the mean of the values or count or so on. check pandas resample.
You can also apply custom aggregators (check the same link). With that in mind, the code snippet for your case can be given as:
f['Date'] = pd.to_datetime(f['Date']) f.set_index('Date', inplace=True) f.sort_index(inplace=True) def take_first(array_like): return array_like[0] def take_last(array_like): return array_like[-1] output = f.resample('W', # Weekly resample how={'Open': take_first, 'High': 'max', 'Low': 'min', 'Close': take_last, 'Volume': 'sum'}, loffset=pd.offsets.timedelta(days=-6)) # to put the labels to Monday output = output[['Open', 'High', 'Low', 'Close', 'Volume']]
Here, W
signifies a weekly resampling which by default spans from Monday to Sunday. To keep the labels as Monday, loffset
is used. There are several predefined day specifiers. Take a look at pandas offsets. You can even define custom offsets (see).
Coming back to the resampling method. Here for Open
and Close
you can specify custom methods to take the first value or so on and pass the function handle to the how
argument.
This answer is based on the assumption that the data seems to be daily, i.e. for each day you have only 1 entry. Also, no data is present for the non-business days. i.e. Sat and Sun. So taking the last data point for the week as the one for Friday is ok. If you so want you can use business week instead of 'W'. Also, for more complex data you may want to use groupby
to group the weekly data and then work on the time indices within them.
btw a gist for the solution can be found at: https://gist.github.com/prithwi/339f87bf9c3c37bb3188
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With