Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas combine stock data if it falls between specific time only in dataframe

I have minute-by-minute stock data from 2017 to 2019. I want to keep only data after 9:16 for each day therefore I want to convert any data between 9:00 to 9:16 as value of 9:16 ie:

value of 09:16 should be

  • open : value of 1st data from 9:00 - 9:16 , here 116.00
  • high : highest value from 9:00 - 9:16, here 117.00
  • low : lowest value from 9:00 - 9:16, here 116.00
  • close: this will be value at 9:16 , here 113.00

                       open    high     low   close
date                                               
2017-01-02 09:08:00  116.00  116.00  116.00  116.00
2017-01-02 09:16:00  116.10  117.80  117.00  113.00
2017-01-02 09:17:00  115.50  116.20  115.50  116.20
2017-01-02 09:18:00  116.05  116.35  116.00  116.00
2017-01-02 09:19:00  116.00  116.00  115.60  115.75
...                     ...     ...     ...     ...
2029-12-29 15:56:00  259.35  259.35  259.35  259.35
2019-12-29 15:57:00  260.00  260.00  260.00  260.00
2019-12-29 15:58:00  260.00  260.00  259.35  259.35
2019-12-29 15:59:00  260.00  260.00  260.00  260.00
2019-12-29 16:36:00  259.35  259.35  259.35  259.35

Here is what I tried :

#Get data from/to 9:00 - 9:16 and create only one data item

convertPreTrade = df.between_time("09:00", "09:16") #09:00 - 09:16

#combine modified value to original data

df.loc[df.index.strftime("%H:%M") == "09:16" , 
    ["open","high","low","close"] ] = [convertPreTrade["open"][0],
                                        convertPreTrade["high"].max(),
                                        convertPreTrade["low"].min(),
                                        convertPreTrade['close'][-1] ] 

but this won't give me accurate data

like image 484
amitnair92 Avatar asked Aug 30 '20 11:08

amitnair92


2 Answers


d = {'date': 'last', 'open': 'last',
     'high': 'max', 'low': 'min', 'close': 'last'}

# df.index = pd.to_datetime(df.index)
s1 = df.between_time('09:00:00', '09:16:00')
s2 = s1.reset_index().groupby(s1.index.date).agg(d).set_index('date')

df1 = pd.concat([df.drop(s1.index), s2]).sort_index()

Details:

Use DataFrame.between_time to filter the rows in the dataframe df that falls between the time 09:00 to 09:16:

print(s1)
                      open   high    low  close
date                                           
2017-01-02 09:08:00  116.0  116.0  116.0  116.0
2017-01-02 09:16:00  116.1  117.8  117.0  113.0

Use DataFrame.groupby to group this filtered dataframe s1 on date and aggregate using dictionary d:

print(s2)
                      open   high    low  close
date                                           
2017-01-02 09:16:00  116.1  117.8  116.0  113.0

Use DataFrame.drop to drop the rows from the original datframe df that falls between the time 09:00-09:16, then use pd.concat to concat it with s2, finally use DataFrame.sort_index to sort the index:

print(df1)
                       open    high     low   close
date                                               
2017-01-02 09:16:00  116.10  117.80  116.00  113.00
2017-01-02 09:17:00  115.50  116.20  115.50  116.20
2017-01-02 09:18:00  116.05  116.35  116.00  116.00
2017-01-02 09:19:00  116.00  116.00  115.60  115.75
2019-12-29 15:57:00  260.00  260.00  260.00  260.00
2019-12-29 15:58:00  260.00  260.00  259.35  259.35
2019-12-29 15:59:00  260.00  260.00  260.00  260.00
2019-12-29 16:36:00  259.35  259.35  259.35  259.35
2029-12-29 15:56:00  259.35  259.35  259.35  259.35

like image 191
Shubham Sharma Avatar answered Oct 13 '22 11:10

Shubham Sharma


Extract from 9:00 to 9:16. The data frames are grouped by year, month, and day and computed against the OHLC values. The logic uses your code. Finally, you add a date column at 9:16. Since we don't have all the data, we may have left out some considerations, but the basic form remains the same.

import pandas as pd
import numpy as np
import io

data = '''
date open high low close
"2017-01-02 09:08:00"  116.00  116.00  116.00  116.00
"2017-01-02 09:16:00"  116.10  117.80  117.00  113.00
"2017-01-02 09:17:00"  115.50  116.20  115.50  116.20
"2017-01-02 09:18:00"  116.05  116.35  116.00  116.00
"2017-01-02 09:19:00"  116.00  116.00  115.60  115.75
"2017-01-03 09:08:00"  259.35  259.35  259.35  259.35
"2017-01-03 09:09:00"  260.00  260.00  260.00  260.00
"2017-12-03 09:18:00"  260.00  260.00  259.35  259.35
"2017-12-04 09:05:00"  260.00  260.00  260.00  260.00
"2017-12-04 09:22:00"  259.35  259.35  259.35  259.35
'''

df = pd.read_csv(io.StringIO(data), sep='\s+')

df.reset_index(drop=True, inplace=True)
df['date'] = pd.to_datetime(df['date'])
# 9:00-9:16
df_start = df[((df['date'].dt.hour == 9) & (df['date'].dt.minute >= 0)) & ((df['date'].dt.hour == 9) & (df['date'].dt.minute <=16))]
# calculate
df_new = (df_start.groupby([df['date'].dt.year, df['date'].dt.month, df['date'].dt.day])
            .agg(open_first=('open', lambda x: x.iloc[0,]),
                 high_max=('high','max'),
                 low_min=('low', 'min'),
                 close_shift=('close', lambda x: x.iloc[-1,])))
df_new.index.names = ['year', 'month', 'day']
df_new.reset_index(inplace=True)
df_new['date'] = df_new['year'].astype(str)+'-'+df_new['month'].astype(str)+'-'+df_new['day'].astype(str)+' 09:16:00'

year    month   day open_first  high_max    low_min close_shift date
0   2017    1   2   116.00  117.8   116.00  113.0   2017-1-2 09:16:00
1   2017    1   3   259.35  260.0   259.35  260.0   2017-1-3 09:16:00
2   2017    12  4   260.00  260.0   260.00  260.0   2017-12-4 09:16:00
like image 34
r-beginners Avatar answered Oct 13 '22 10:10

r-beginners