I have minute-by-minute stock data from 2017 to 2019. I want to keep only data after 9:16 for each day therefore I want to convert any data between 9:00 to 9:16 as value of 9:16 ie:
value of 09:16 should be
open
: value of 1st data from 9:00 - 9:16 , here 116.00high
: highest value from 9:00 - 9:16, here 117.00low
: lowest value from 9:00 - 9:16, here 116.00close
: this will be value at 9:16 , here 113.00 open high low close
date
2017-01-02 09:08:00 116.00 116.00 116.00 116.00
2017-01-02 09:16:00 116.10 117.80 117.00 113.00
2017-01-02 09:17:00 115.50 116.20 115.50 116.20
2017-01-02 09:18:00 116.05 116.35 116.00 116.00
2017-01-02 09:19:00 116.00 116.00 115.60 115.75
... ... ... ... ...
2029-12-29 15:56:00 259.35 259.35 259.35 259.35
2019-12-29 15:57:00 260.00 260.00 260.00 260.00
2019-12-29 15:58:00 260.00 260.00 259.35 259.35
2019-12-29 15:59:00 260.00 260.00 260.00 260.00
2019-12-29 16:36:00 259.35 259.35 259.35 259.35
Here is what I tried :
#Get data from/to 9:00 - 9:16 and create only one data item
convertPreTrade = df.between_time("09:00", "09:16") #09:00 - 09:16
#combine modified value to original data
df.loc[df.index.strftime("%H:%M") == "09:16" ,
["open","high","low","close"] ] = [convertPreTrade["open"][0],
convertPreTrade["high"].max(),
convertPreTrade["low"].min(),
convertPreTrade['close'][-1] ]
but this won't give me accurate data
d = {'date': 'last', 'open': 'last',
'high': 'max', 'low': 'min', 'close': 'last'}
# df.index = pd.to_datetime(df.index)
s1 = df.between_time('09:00:00', '09:16:00')
s2 = s1.reset_index().groupby(s1.index.date).agg(d).set_index('date')
df1 = pd.concat([df.drop(s1.index), s2]).sort_index()
Use DataFrame.between_time
to filter the rows in the dataframe df
that falls between the time 09:00
to 09:16
:
print(s1)
open high low close
date
2017-01-02 09:08:00 116.0 116.0 116.0 116.0
2017-01-02 09:16:00 116.1 117.8 117.0 113.0
Use DataFrame.groupby
to group this filtered dataframe s1
on date
and aggregate using dictionary d
:
print(s2)
open high low close
date
2017-01-02 09:16:00 116.1 117.8 116.0 113.0
Use DataFrame.drop
to drop the rows from the original datframe df
that falls between the time 09:00-09:16
, then use pd.concat
to concat it with s2
, finally use DataFrame.sort_index
to sort the index:
print(df1)
open high low close
date
2017-01-02 09:16:00 116.10 117.80 116.00 113.00
2017-01-02 09:17:00 115.50 116.20 115.50 116.20
2017-01-02 09:18:00 116.05 116.35 116.00 116.00
2017-01-02 09:19:00 116.00 116.00 115.60 115.75
2019-12-29 15:57:00 260.00 260.00 260.00 260.00
2019-12-29 15:58:00 260.00 260.00 259.35 259.35
2019-12-29 15:59:00 260.00 260.00 260.00 260.00
2019-12-29 16:36:00 259.35 259.35 259.35 259.35
2029-12-29 15:56:00 259.35 259.35 259.35 259.35
Extract from 9:00 to 9:16. The data frames are grouped by year, month, and day and computed against the OHLC values. The logic uses your code. Finally, you add a date column at 9:16. Since we don't have all the data, we may have left out some considerations, but the basic form remains the same.
import pandas as pd
import numpy as np
import io
data = '''
date open high low close
"2017-01-02 09:08:00" 116.00 116.00 116.00 116.00
"2017-01-02 09:16:00" 116.10 117.80 117.00 113.00
"2017-01-02 09:17:00" 115.50 116.20 115.50 116.20
"2017-01-02 09:18:00" 116.05 116.35 116.00 116.00
"2017-01-02 09:19:00" 116.00 116.00 115.60 115.75
"2017-01-03 09:08:00" 259.35 259.35 259.35 259.35
"2017-01-03 09:09:00" 260.00 260.00 260.00 260.00
"2017-12-03 09:18:00" 260.00 260.00 259.35 259.35
"2017-12-04 09:05:00" 260.00 260.00 260.00 260.00
"2017-12-04 09:22:00" 259.35 259.35 259.35 259.35
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
df.reset_index(drop=True, inplace=True)
df['date'] = pd.to_datetime(df['date'])
# 9:00-9:16
df_start = df[((df['date'].dt.hour == 9) & (df['date'].dt.minute >= 0)) & ((df['date'].dt.hour == 9) & (df['date'].dt.minute <=16))]
# calculate
df_new = (df_start.groupby([df['date'].dt.year, df['date'].dt.month, df['date'].dt.day])
.agg(open_first=('open', lambda x: x.iloc[0,]),
high_max=('high','max'),
low_min=('low', 'min'),
close_shift=('close', lambda x: x.iloc[-1,])))
df_new.index.names = ['year', 'month', 'day']
df_new.reset_index(inplace=True)
df_new['date'] = df_new['year'].astype(str)+'-'+df_new['month'].astype(str)+'-'+df_new['day'].astype(str)+' 09:16:00'
year month day open_first high_max low_min close_shift date
0 2017 1 2 116.00 117.8 116.00 113.0 2017-1-2 09:16:00
1 2017 1 3 259.35 260.0 259.35 260.0 2017-1-3 09:16:00
2 2017 12 4 260.00 260.0 260.00 260.0 2017-12-4 09:16:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With