I have the following frame:
lst = [
['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14, '8:30:00'],
['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61, '8:31:00'],
['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2, '8:32:00'],
['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32, '8:33:00'],
['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5, '8:34:00'],
['SPXW 250709C06345000', '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'],
['SPXW 250709C06345000', '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'],
['SPXW 250709C06345000', '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'],
['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5, '8:33:00'],
['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'],
['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2, '8:30:00'],
['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2, '8:31:00'],
['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20, '8:32:00'],
['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1, '8:34:00'],
['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2, '8:35:00']]
df = pd.DataFrame(
lst,
columns=['Symbol', 'Date', 'open', 'high', 'low', 'close', 'volume', 'hour'])
df['Date'] = pd.to_datetime(df['Date'])
df['hour'] = pd.to_datetime(df['hour'], format='%H:%M:%S')
df.set_index(['Symbol', 'Date'], inplace=True)
df["day_high"] = ""
df["day_low"] = ""
df["day_high_time"] = ""
df["day_low_time"] = ""
mask = df['hour'].dt.strftime('%H:%M').eq('08:30')
df.loc[mask, 'day_high'] = df.groupby(['Symbol','Date'])['high'].max()
df.loc[mask, 'day_low'] = df.groupby(['Symbol','Date'])['low'].min()
When run, it prints this:
open high low close volume hour day_high day_low day_high_time day_low_time
Symbol Date
SPXW 250715C06310000 2025-07-14 2.74 2.87 2.60 2.65 14 1900-01-01 08:30:00 2.87 2.25
2025-07-14 2.80 2.80 2.50 2.53 61 1900-01-01 08:31:00
2025-07-14 2.45 2.45 2.45 2.45 2 1900-01-01 08:32:00
2025-07-14 2.58 2.80 2.58 2.60 32 1900-01-01 08:33:00
2025-07-14 2.50 2.50 2.25 2.30 5 1900-01-01 08:34:00
SPXW 250709C06345000 2025-07-09 0.05 0.05 0.03 0.03 246 1900-01-01 08:30:00 0.1 0.03
2025-07-09 0.05 0.10 0.03 0.07 452 1900-01-01 08:31:00
2025-07-09 0.07 0.10 0.05 0.07 137 1900-01-01 08:32:00
2025-07-09 0.07 0.07 0.07 0.07 5 1900-01-01 08:33:00
2025-07-09 0.07 0.07 0.05 0.05 225 1900-01-01 08:34:00
SPXW 250715C06310000 2025-07-11 7.30 7.30 7.30 7.30 2 1900-01-01 08:30:00 7.3 6.41
2025-07-11 7.20 7.20 7.20 7.20 2 1900-01-01 08:31:00
2025-07-11 6.92 6.92 6.92 6.92 20 1900-01-01 08:32:00
2025-07-11 6.58 6.58 6.58 6.58 1 1900-01-01 08:34:00
2025-07-11 6.41 6.41 6.41 6.41 2 1900-01-01 08:35:00
My previous question was how to find max value and stick it in the day_high column where hour is equal to 8:30. It was achieved, along with day_low, by the lines of code above starting from mask =.
What I want to do now is find the time at which the max or min value occurred. I tried the following:
hour_loc = df.groupby(['Symbol', 'Date'])['high'].max()
df.loc[mask, 'day_high_time'] = df.loc[df['high'] == hour_loc, 'hour']
When I run that, I get an error:
Can only compare identically-labeled Series objects
In my previous problem about finding max value, another solution was to use the pd.groupby.transform method. Using the logic of that answer, I tried the following:
hour_loc_max = df.groupby(['Symbol', 'Date'])['high'].transform('max').where(mask)
hour_loc_min = df.groupby(['Symbol', 'Date'])['low'].transform('min').where(mask)
df.loc[mask, 'day_high_time'] = df.loc[df['high'] == hour_loc_max, 'hour']
df.loc[mask, 'day_low_time'] = df.loc[df['low'] == hour_loc_min, 'hour']
When run, it prints this:
open high low close volume hour day_high day_low day_high_time day_low_time
Symbol Date
SPXW 250715C06310000 2025-07-14 2.74 2.87 2.60 2.65 14 1900-01-01 08:30:00 2.87 2.25 1900-01-01 08:30:00 NaT
2025-07-14 2.80 2.80 2.50 2.53 61 1900-01-01 08:31:00
2025-07-14 2.45 2.45 2.45 2.45 2 1900-01-01 08:32:00
2025-07-14 2.58 2.80 2.58 2.60 32 1900-01-01 08:33:00
2025-07-14 2.50 2.50 2.25 2.30 5 1900-01-01 08:34:00
SPXW 250709C06345000 2025-07-09 0.05 0.05 0.03 0.03 246 1900-01-01 08:30:00 0.1 0.03 NaT NaT
2025-07-09 0.05 0.10 0.03 0.07 452 1900-01-01 08:31:00
2025-07-09 0.07 0.10 0.05 0.07 137 1900-01-01 08:32:00
2025-07-09 0.07 0.07 0.07 0.07 5 1900-01-01 08:33:00
2025-07-09 0.07 0.07 0.05 0.05 225 1900-01-01 08:34:00
SPXW 250715C06310000 2025-07-11 7.30 7.30 7.30 7.30 2 1900-01-01 08:30:00 7.3 6.41 1900-01-01 08:30:00 NaT
2025-07-11 7.20 7.20 7.20 7.20 2 1900-01-01 08:31:00
2025-07-11 6.92 6.92 6.92 6.92 20 1900-01-01 08:32:00
2025-07-11 6.58 6.58 6.58 6.58 1 1900-01-01 08:34:00
2025-07-11 6.41 6.41 6.41 6.41 2 1900-01-01 08:35:00
Here are my questions:
Why does the first method not work? It worked when I wanted to copy max value from high/low columns to day_high/day_low columns at 8:30.
What are NaT? All of the time values should be datetime objects.
My previous question was how to find max value, and stick it in the day_high column where hour is equal to 8:30.
Don't do that if you don't have to. It's square-peg-round-hole; what you are currently doing is a violation of normal form. Make a separate day summary frame.
This will be efficient enough:
import pandas as pd
df = pd.DataFrame(
columns=['symbol', 'date', 'open', 'high', 'low', 'close', 'volume', 'time'],
data=[
['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14, '8:30:00'],
['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61, '8:31:00'],
['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2, '8:32:00'],
['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32, '8:33:00'],
['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5, '8:34:00'],
['SPXW 250709C06345000', '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'],
['SPXW 250709C06345000', '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'],
['SPXW 250709C06345000', '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'],
['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5, '8:33:00'],
['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'],
['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2, '8:30:00'],
['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2, '8:31:00'],
['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20, '8:32:00'],
['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1, '8:34:00'],
['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2, '8:35:00'],
],
)
df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])
summary = df.groupby(
by=['symbol', pd.Grouper(key='datetime', freq='D')],
).agg({'low': ['min', 'idxmin'], 'high': ['max', 'idxmax']})
summary.columns = 'low', 'low_idx', 'high', 'high_idx'
summary['low_time'] = df['datetime'].iloc[summary['low_idx']].values
summary['high_time'] = df['datetime'].iloc[summary['high_idx']].values
summary.drop(columns=['low_idx', 'high_idx'], inplace=True)
print(summary.T)
symbol SPXW 250709C06345000 SPXW 250715C06310000
datetime 2025-07-09 2025-07-11 2025-07-14
low 0.03 6.41 2.25
high 0.1 7.3 2.87
low_time 2025-07-09 08:30:00 2025-07-11 08:35:00 2025-07-14 08:34:00
high_time 2025-07-09 08:31:00 2025-07-11 08:30:00 2025-07-14 08:30:00
If you really need it back in a (bad) form that spreads the day summary across the original data, then you can reindex.
There is another style where you only aggregate using idx[min|max], which might theoretically speed the thing up if you have massive data:
idx = df.groupby(
by=['symbol', pd.Grouper(key='datetime', freq='D')],
).agg({'low': 'idxmin', 'high': 'idxmax'})
summary = pd.DataFrame(index=idx.index)
summary[['low', 'low_time']] = df[['low', 'datetime']].iloc[idx['low']].values
summary[['high', 'high_time']] = df[['high', 'datetime']].iloc[idx['high']].values
print(summary.T)
This is another way you can achieve this:
max_times = df.loc[df['high'].isin(max_values), 'hour']
max_times = max_times[~max_times.index.duplicated(keep="first")]
df.loc[mask, 'day_high_time'] = max_times.values
and also, for more efficiency change your above code to this:
df["hour"] = df["hour"].dt.strftime('%H:%M:%S')
mask = df['hour'].eq('08:30:00')
max_values = df.groupby(['Symbol','Date'])['high'].max()
df.loc[mask, 'day_high'] = max_values
min_values = df.groupby(['Symbol','Date'])['low'].min()
df.loc[mask, 'day_low'] = min_values
Instead of formatting the 'hour' column every time we want to use it, we can assign 'hour' to a formatted version of hour
Instead of just assigning df.loc[mask, 'day_high'] and 'day_low' to the max and min values, we first store them in variables so we can use it later in our code.
Explanation:
max_times = df.loc[df['high'].isin(max_values), 'hour']
We are setting a new variable called max_times to only the 'hour' values where df['high'] is in the max_values variable we created earlier which has the max price for each symbol and date. We have to use isin since max_values and df['high'] have different lengths.
max_times = max_times[~max_times.index.duplicated(keep="first")]
This line is only needed when two different hours have the same price since like in the second symbol and date there is two 0.1 's and two 0.03's.
This line only keeps the first hour if two hours are in the same date and symbol, meaning if 8:30 and 8:31 where there for the same symbol, it would only keep one of them. This is needed so we can assign these values to the data frame.
df.loc[mask, 'day_high_time'] = max_times.values
This is the exact same logic as in your day_high and day_low assignment, but since max_times is actually a series, we need to assign the values of max_times to the 'day_high_time' column.
For the low times, this is the code:
min_times = df.loc[df['low'].isin(min_values), 'hour']
min_times = min_times[~min_times.index.duplicated(keep="first")]
df.loc[mask, 'day_low_time'] = min_times.values
It's the same logic but for the minimum instead of maximum values.
If you want to reduce the code, you can do this instead:
# For Max Times
max_times = df.loc[df['high'].isin(max_values), 'hour']
df.loc[mask, 'day_high_time'] = max_times[~max_times.index.duplicated(keep="first")].values
# For Min Times
min_times = df.loc[df['low'].isin(min_values), 'hour']
df.loc[mask, 'day_low_time'] = min_times[~min_times.index.duplicated(keep="first")].values
You also asked what NaT values were, these stand for Not a Time and are NaN (Not a Number) values but for times.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With