I have minute-by-minute stock data from 2017 to 2019. I want to keep only data after 9:16 for each day therefore I want to convert any data between 9:00 to 9:16 as value of 9:16 ie:
value of 09:16 should be
open : value of 1st data from 9:00 - 9:16 , here 116.00high : highest value from 9:00 - 9:16, here 117.00low : lowest value from 9:00 - 9:16, here 116.00close: this will be value at 9:16 , here 113.00 open high low close
date
2017-01-02 09:08:00 116.00 116.00 116.00 116.00
2017-01-02 09:16:00 116.10 117.80 117.00 113.00
2017-01-02 09:17:00 115.50 116.20 115.50 116.20
2017-01-02 09:18:00 116.05 116.35 116.00 116.00
2017-01-02 09:19:00 116.00 116.00 115.60 115.75
... ... ... ... ...
2029-12-29 15:56:00 259.35 259.35 259.35 259.35
2019-12-29 15:57:00 260.00 260.00 260.00 260.00
2019-12-29 15:58:00 260.00 260.00 259.35 259.35
2019-12-29 15:59:00 260.00 260.00 260.00 260.00
2019-12-29 16:36:00 259.35 259.35 259.35 259.35
Here is what I tried :
#Get data from/to 9:00 - 9:16 and create only one data item
convertPreTrade = df.between_time("09:00", "09:16") #09:00 - 09:16
#combine modified value to original data
df.loc[df.index.strftime("%H:%M") == "09:16" ,
["open","high","low","close"] ] = [convertPreTrade["open"][0],
convertPreTrade["high"].max(),
convertPreTrade["low"].min(),
convertPreTrade['close'][-1] ]
but this won't give me accurate data
d = {'date': 'last', 'open': 'last',
'high': 'max', 'low': 'min', 'close': 'last'}
# df.index = pd.to_datetime(df.index)
s1 = df.between_time('09:00:00', '09:16:00')
s2 = s1.reset_index().groupby(s1.index.date).agg(d).set_index('date')
df1 = pd.concat([df.drop(s1.index), s2]).sort_index()
Use DataFrame.between_time to filter the rows in the dataframe df that falls between the time 09:00 to 09:16:
print(s1)
open high low close
date
2017-01-02 09:08:00 116.0 116.0 116.0 116.0
2017-01-02 09:16:00 116.1 117.8 117.0 113.0
Use DataFrame.groupby to group this filtered dataframe s1 on date and aggregate using dictionary d:
print(s2)
open high low close
date
2017-01-02 09:16:00 116.1 117.8 116.0 113.0
Use DataFrame.drop to drop the rows from the original datframe df that falls between the time 09:00-09:16, then use pd.concat to concat it with s2, finally use DataFrame.sort_index to sort the index:
print(df1)
open high low close
date
2017-01-02 09:16:00 116.10 117.80 116.00 113.00
2017-01-02 09:17:00 115.50 116.20 115.50 116.20
2017-01-02 09:18:00 116.05 116.35 116.00 116.00
2017-01-02 09:19:00 116.00 116.00 115.60 115.75
2019-12-29 15:57:00 260.00 260.00 260.00 260.00
2019-12-29 15:58:00 260.00 260.00 259.35 259.35
2019-12-29 15:59:00 260.00 260.00 260.00 260.00
2019-12-29 16:36:00 259.35 259.35 259.35 259.35
2029-12-29 15:56:00 259.35 259.35 259.35 259.35
Extract from 9:00 to 9:16. The data frames are grouped by year, month, and day and computed against the OHLC values. The logic uses your code. Finally, you add a date column at 9:16. Since we don't have all the data, we may have left out some considerations, but the basic form remains the same.
import pandas as pd
import numpy as np
import io
data = '''
date open high low close
"2017-01-02 09:08:00" 116.00 116.00 116.00 116.00
"2017-01-02 09:16:00" 116.10 117.80 117.00 113.00
"2017-01-02 09:17:00" 115.50 116.20 115.50 116.20
"2017-01-02 09:18:00" 116.05 116.35 116.00 116.00
"2017-01-02 09:19:00" 116.00 116.00 115.60 115.75
"2017-01-03 09:08:00" 259.35 259.35 259.35 259.35
"2017-01-03 09:09:00" 260.00 260.00 260.00 260.00
"2017-12-03 09:18:00" 260.00 260.00 259.35 259.35
"2017-12-04 09:05:00" 260.00 260.00 260.00 260.00
"2017-12-04 09:22:00" 259.35 259.35 259.35 259.35
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
df.reset_index(drop=True, inplace=True)
df['date'] = pd.to_datetime(df['date'])
# 9:00-9:16
df_start = df[((df['date'].dt.hour == 9) & (df['date'].dt.minute >= 0)) & ((df['date'].dt.hour == 9) & (df['date'].dt.minute <=16))]
# calculate
df_new = (df_start.groupby([df['date'].dt.year, df['date'].dt.month, df['date'].dt.day])
.agg(open_first=('open', lambda x: x.iloc[0,]),
high_max=('high','max'),
low_min=('low', 'min'),
close_shift=('close', lambda x: x.iloc[-1,])))
df_new.index.names = ['year', 'month', 'day']
df_new.reset_index(inplace=True)
df_new['date'] = df_new['year'].astype(str)+'-'+df_new['month'].astype(str)+'-'+df_new['day'].astype(str)+' 09:16:00'
year month day open_first high_max low_min close_shift date
0 2017 1 2 116.00 117.8 116.00 113.0 2017-1-2 09:16:00
1 2017 1 3 259.35 260.0 259.35 260.0 2017-1-3 09:16:00
2 2017 12 4 260.00 260.0 260.00 260.0 2017-12-4 09:16:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With