pandas combine stock data if it falls between specific time only in dataframe

Question

I have minute-by-minute stock data from 2017 to 2019. I want to keep only data after 9:16 for each day therefore I want to convert any data between 9:00 to 9:16 as value of 9:16 ie:

value of 09:16 should be

open : value of 1st data from 9:00 - 9:16 , here 116.00
high : highest value from 9:00 - 9:16, here 117.00
low : lowest value from 9:00 - 9:16, here 116.00
close: this will be value at 9:16 , here 113.00

                       open    high     low   close
date                                               
2017-01-02 09:08:00  116.00  116.00  116.00  116.00
2017-01-02 09:16:00  116.10  117.80  117.00  113.00
2017-01-02 09:17:00  115.50  116.20  115.50  116.20
2017-01-02 09:18:00  116.05  116.35  116.00  116.00
2017-01-02 09:19:00  116.00  116.00  115.60  115.75
...                     ...     ...     ...     ...
2029-12-29 15:56:00  259.35  259.35  259.35  259.35
2019-12-29 15:57:00  260.00  260.00  260.00  260.00
2019-12-29 15:58:00  260.00  260.00  259.35  259.35
2019-12-29 15:59:00  260.00  260.00  260.00  260.00
2019-12-29 16:36:00  259.35  259.35  259.35  259.35

Here is what I tried :

#Get data from/to 9:00 - 9:16 and create only one data item

convertPreTrade = df.between_time("09:00", "09:16") #09:00 - 09:16

#combine modified value to original data

df.loc[df.index.strftime("%H:%M") == "09:16" , 
    ["open","high","low","close"] ] = [convertPreTrade["open"][0],
                                        convertPreTrade["high"].max(),
                                        convertPreTrade["low"].min(),
                                        convertPreTrade['close'][-1] ]

but this won't give me accurate data

Shubham Sharma · Accepted Answer

d = {'date': 'last', 'open': 'last',
     'high': 'max', 'low': 'min', 'close': 'last'}

# df.index = pd.to_datetime(df.index)
s1 = df.between_time('09:00:00', '09:16:00')
s2 = s1.reset_index().groupby(s1.index.date).agg(d).set_index('date')

df1 = pd.concat([df.drop(s1.index), s2]).sort_index()

Details:

Use DataFrame.between_time to filter the rows in the dataframe df that falls between the time 09:00 to 09:16:

print(s1)
                      open   high    low  close
date                                           
2017-01-02 09:08:00  116.0  116.0  116.0  116.0
2017-01-02 09:16:00  116.1  117.8  117.0  113.0

Use DataFrame.groupby to group this filtered dataframe s1 on date and aggregate using dictionary d:

print(s2)
                      open   high    low  close
date                                           
2017-01-02 09:16:00  116.1  117.8  116.0  113.0

Use DataFrame.drop to drop the rows from the original datframe df that falls between the time 09:00-09:16, then use pd.concat to concat it with s2, finally use DataFrame.sort_index to sort the index:

print(df1)
                       open    high     low   close
date                                               
2017-01-02 09:16:00  116.10  117.80  116.00  113.00
2017-01-02 09:17:00  115.50  116.20  115.50  116.20
2017-01-02 09:18:00  116.05  116.35  116.00  116.00
2017-01-02 09:19:00  116.00  116.00  115.60  115.75
2019-12-29 15:57:00  260.00  260.00  260.00  260.00
2019-12-29 15:58:00  260.00  260.00  259.35  259.35
2019-12-29 15:59:00  260.00  260.00  260.00  260.00
2019-12-29 16:36:00  259.35  259.35  259.35  259.35
2029-12-29 15:56:00  259.35  259.35  259.35  259.35

r-beginners · Answer

Extract from 9:00 to 9:16. The data frames are grouped by year, month, and day and computed against the OHLC values. The logic uses your code. Finally, you add a date column at 9:16. Since we don't have all the data, we may have left out some considerations, but the basic form remains the same.

import pandas as pd
import numpy as np
import io

data = '''
date open high low close
"2017-01-02 09:08:00"  116.00  116.00  116.00  116.00
"2017-01-02 09:16:00"  116.10  117.80  117.00  113.00
"2017-01-02 09:17:00"  115.50  116.20  115.50  116.20
"2017-01-02 09:18:00"  116.05  116.35  116.00  116.00
"2017-01-02 09:19:00"  116.00  116.00  115.60  115.75
"2017-01-03 09:08:00"  259.35  259.35  259.35  259.35
"2017-01-03 09:09:00"  260.00  260.00  260.00  260.00
"2017-12-03 09:18:00"  260.00  260.00  259.35  259.35
"2017-12-04 09:05:00"  260.00  260.00  260.00  260.00
"2017-12-04 09:22:00"  259.35  259.35  259.35  259.35
'''

df = pd.read_csv(io.StringIO(data), sep='\s+')

df.reset_index(drop=True, inplace=True)
df['date'] = pd.to_datetime(df['date'])
# 9:00-9:16
df_start = df[((df['date'].dt.hour == 9) & (df['date'].dt.minute >= 0)) & ((df['date'].dt.hour == 9) & (df['date'].dt.minute <=16))]
# calculate
df_new = (df_start.groupby([df['date'].dt.year, df['date'].dt.month, df['date'].dt.day])
            .agg(open_first=('open', lambda x: x.iloc[0,]),
                 high_max=('high','max'),
                 low_min=('low', 'min'),
                 close_shift=('close', lambda x: x.iloc[-1,])))
df_new.index.names = ['year', 'month', 'day']
df_new.reset_index(inplace=True)
df_new['date'] = df_new['year'].astype(str)+'-'+df_new['month'].astype(str)+'-'+df_new['day'].astype(str)+' 09:16:00'

year    month   day open_first  high_max    low_min close_shift date
0   2017    1   2   116.00  117.8   116.00  113.0   2017-1-2 09:16:00
1   2017    1   3   259.35  260.0   259.35  260.0   2017-1-3 09:16:00
2   2017    12  4   260.00  260.0   260.00  260.0   2017-12-4 09:16:00

pandas combine stock data if it falls between specific time only in dataframe

Tags:

python

pandas

dataframe

stock

amitnair92

2 Answers

Details:

Shubham Sharma

r-beginners

Recent Activity

Donate For Us

pandas combine stock data if it falls between specific time only in dataframe

Tags:

python

pandas

dataframe

stock

amitnair92

2 Answers

Details:

Shubham Sharma

r-beginners

Related questions

Recent Activity

Donate For Us