I have a dataframe with a timeseries data of wheat in df
.
df = wt["WHEAT_USD"]
2016-05-02 02:00:00+02:00 4.780
2016-05-02 02:01:00+02:00 4.777
2016-05-02 02:02:00+02:00 4.780
2016-05-02 02:03:00+02:00 4.780
2016-05-02 02:04:00+02:00 4.780
Name: closeAsk, dtype: float64
When I plot the data it has these annoying horizontal lines because of weekends. Is there a simple way of removing the non-business days from the dataframe itself?
Something like
df = df.BDays()
You can delete a list of rows from Pandas by passing the list of indices to the drop() method. In this code, [5,6] is the index of the rows you want to delete. axis=0 denotes that rows should be deleted from the dataframe.
You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .
When the function receives the date string it will first use the Pandas to_datetime() function to convert it to a Python datetime and it will then use the timedelta() function to subtract the number of days defined in the days variable.
One simple solution is to slice out the days not in Monday to Friday:
In [11]: s[s.index.dayofweek < 5] Out[11]: 2016-05-02 00:00:00 4.780 2016-05-02 00:01:00 4.777 2016-05-02 00:02:00 4.780 2016-05-02 00:03:00 4.780 2016-05-02 00:04:00 4.780 Name: closeAsk, dtype: float64
Note: this doesn't take into account bank holidays etc.
Pandas BDay
just ends up using .dayofweek<5
like the chosen answer, but can be extended to account for bank holidays, etc.
import pandas as pd
from pandas.tseries.offsets import BDay
isBusinessDay = BDay().onOffset
csv_path = 'C:\\Python27\\Lib\\site-packages\\bokeh\\sampledata\\daylight_warsaw_2013.csv'
dates_df = pd.read_csv(csv_path)
match_series = pd.to_datetime(dates_df['Date']).map(isBusinessDay)
dates_df[match_series]
I am building a backtester for stock/FX trading and I also have these issue with days that are nan because that they are holidays or other non trading days.. you can download a financial calendar for the days that there is no trading and then you need to think about timezone and weekends.. etc..
But the best solution is not to use date/time as the index for the candles or price. So do not connect your price data to a date/time but just to a counter of candles or prices .. you can use a second index for this.. so for calculations of MA or other technical lines dont use date/time .. if you look at Metatrader 4/5 it also doesnt use date/time but the index of the data is the candle number !!
I think that you need to let go of the date-time for the price if you work with stock or FX data , of cause you can put them in a column of the data-frame but dont use it as the index This way you can avoid many problems
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With