Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find max value of a column, then find another value in the same row, and copy that value to a new column [closed]

I have the following frame:

lst = [
    ['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14, '8:30:00'],
    ['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61, '8:31:00'],
    ['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2, '8:32:00'],
    ['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32, '8:33:00'],
    ['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5, '8:34:00'],
    ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'],
    ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'],
    ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'],
    ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5, '8:33:00'],
    ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'],
    ['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2, '8:30:00'],
    ['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2, '8:31:00'],
    ['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20, '8:32:00'],
    ['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1, '8:34:00'],
    ['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2, '8:35:00']]
df = pd.DataFrame(
    lst,
    columns=['Symbol', 'Date', 'open', 'high', 'low', 'close', 'volume', 'hour'])

df['Date'] = pd.to_datetime(df['Date'])
df['hour'] = pd.to_datetime(df['hour'], format='%H:%M:%S')
df.set_index(['Symbol', 'Date'], inplace=True)
df["day_high"] = ""
df["day_low"] = ""
df["day_high_time"] = ""
df["day_low_time"] = ""

mask = df['hour'].dt.strftime('%H:%M').eq('08:30')
df.loc[mask, 'day_high'] = df.groupby(['Symbol','Date'])['high'].max()
df.loc[mask, 'day_low'] = df.groupby(['Symbol','Date'])['low'].min()

When run, it prints this:

                                open  high   low  close  volume                hour day_high day_low day_high_time day_low_time
Symbol               Date                                                                                                       
SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14 1900-01-01 08:30:00     2.87    2.25                           
                     2025-07-14  2.80  2.80  2.50   2.53      61 1900-01-01 08:31:00                                            
                     2025-07-14  2.45  2.45  2.45   2.45       2 1900-01-01 08:32:00                                            
                     2025-07-14  2.58  2.80  2.58   2.60      32 1900-01-01 08:33:00                                            
                     2025-07-14  2.50  2.50  2.25   2.30       5 1900-01-01 08:34:00                                            
SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246 1900-01-01 08:30:00      0.1    0.03                           
                     2025-07-09  0.05  0.10  0.03   0.07     452 1900-01-01 08:31:00                                            
                     2025-07-09  0.07  0.10  0.05   0.07     137 1900-01-01 08:32:00                                            
                     2025-07-09  0.07  0.07  0.07   0.07       5 1900-01-01 08:33:00                                            
                     2025-07-09  0.07  0.07  0.05   0.05     225 1900-01-01 08:34:00                                            
SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2 1900-01-01 08:30:00      7.3    6.41                           
                     2025-07-11  7.20  7.20  7.20   7.20       2 1900-01-01 08:31:00                                            
                     2025-07-11  6.92  6.92  6.92   6.92      20 1900-01-01 08:32:00                                            
                     2025-07-11  6.58  6.58  6.58   6.58       1 1900-01-01 08:34:00                                            
                     2025-07-11  6.41  6.41  6.41   6.41       2 1900-01-01 08:35:00

My previous question was how to find max value and stick it in the day_high column where hour is equal to 8:30. It was achieved, along with day_low, by the lines of code above starting from mask =.

What I want to do now is find the time at which the max or min value occurred. I tried the following:

hour_loc = df.groupby(['Symbol', 'Date'])['high'].max()
df.loc[mask, 'day_high_time'] = df.loc[df['high'] == hour_loc, 'hour']

When I run that, I get an error:

Can only compare identically-labeled Series objects

In my previous problem about finding max value, another solution was to use the pd.groupby.transform method. Using the logic of that answer, I tried the following:

hour_loc_max = df.groupby(['Symbol', 'Date'])['high'].transform('max').where(mask)
hour_loc_min = df.groupby(['Symbol', 'Date'])['low'].transform('min').where(mask)
df.loc[mask, 'day_high_time'] = df.loc[df['high'] == hour_loc_max, 'hour']
df.loc[mask, 'day_low_time'] = df.loc[df['low'] == hour_loc_min, 'hour']

When run, it prints this:

                                open  high   low  close  volume                hour day_high day_low        day_high_time day_low_time
Symbol               Date                                                                                                              
SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14 1900-01-01 08:30:00     2.87    2.25  1900-01-01 08:30:00          NaT
                     2025-07-14  2.80  2.80  2.50   2.53      61 1900-01-01 08:31:00                                                   
                     2025-07-14  2.45  2.45  2.45   2.45       2 1900-01-01 08:32:00                                                   
                     2025-07-14  2.58  2.80  2.58   2.60      32 1900-01-01 08:33:00                                                   
                     2025-07-14  2.50  2.50  2.25   2.30       5 1900-01-01 08:34:00                                                   
SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246 1900-01-01 08:30:00      0.1    0.03                  NaT          NaT
                     2025-07-09  0.05  0.10  0.03   0.07     452 1900-01-01 08:31:00                                                   
                     2025-07-09  0.07  0.10  0.05   0.07     137 1900-01-01 08:32:00                                                   
                     2025-07-09  0.07  0.07  0.07   0.07       5 1900-01-01 08:33:00                                                   
                     2025-07-09  0.07  0.07  0.05   0.05     225 1900-01-01 08:34:00                                                   
SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2 1900-01-01 08:30:00      7.3    6.41  1900-01-01 08:30:00          NaT
                     2025-07-11  7.20  7.20  7.20   7.20       2 1900-01-01 08:31:00                                                   
                     2025-07-11  6.92  6.92  6.92   6.92      20 1900-01-01 08:32:00                                                   
                     2025-07-11  6.58  6.58  6.58   6.58       1 1900-01-01 08:34:00                                                   
                     2025-07-11  6.41  6.41  6.41   6.41       2 1900-01-01 08:35:00      

Here are my questions:

  1. Why does the first method not work? It worked when I wanted to copy max value from high/low columns to day_high/day_low columns at 8:30.

  2. What are NaT? All of the time values should be datetime objects.

like image 377
Dan Avatar asked Oct 30 '25 02:10

Dan


2 Answers

My previous question was how to find max value, and stick it in the day_high column where hour is equal to 8:30.

Don't do that if you don't have to. It's square-peg-round-hole; what you are currently doing is a violation of normal form. Make a separate day summary frame.

This will be efficient enough:

import pandas as pd

df = pd.DataFrame(
    columns=['symbol', 'date', 'open', 'high', 'low', 'close', 'volume', 'time'],
    data=[
        ['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14,  '8:30:00'],
        ['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61,  '8:31:00'],
        ['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2,   '8:32:00'],
        ['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32,  '8:33:00'],
        ['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5,   '8:34:00'],
        ['SPXW 250709C06345000',  '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'],
        ['SPXW 250709C06345000',  '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'],
        ['SPXW 250709C06345000',  '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'],
        ['SPXW 250709C06345000',  '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5,   '8:33:00'],
        ['SPXW 250709C06345000',  '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'],
        ['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2,   '8:30:00'],
        ['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2,   '8:31:00'],
        ['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20,  '8:32:00'],
        ['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1,   '8:34:00'],
        ['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2,   '8:35:00'],
    ],
)
df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])

summary = df.groupby(
    by=['symbol', pd.Grouper(key='datetime', freq='D')],
).agg({'low': ['min', 'idxmin'], 'high': ['max', 'idxmax']})
summary.columns = 'low', 'low_idx', 'high', 'high_idx'

summary['low_time'] = df['datetime'].iloc[summary['low_idx']].values
summary['high_time'] = df['datetime'].iloc[summary['high_idx']].values
summary.drop(columns=['low_idx', 'high_idx'], inplace=True)
print(summary.T)
symbol    SPXW 250709C06345000 SPXW 250715C06310000                     
datetime            2025-07-09           2025-07-11           2025-07-14
low                       0.03                 6.41                 2.25
high                       0.1                  7.3                 2.87
low_time   2025-07-09 08:30:00  2025-07-11 08:35:00  2025-07-14 08:34:00
high_time  2025-07-09 08:31:00  2025-07-11 08:30:00  2025-07-14 08:30:00

If you really need it back in a (bad) form that spreads the day summary across the original data, then you can reindex.

There is another style where you only aggregate using idx[min|max], which might theoretically speed the thing up if you have massive data:

idx = df.groupby(
    by=['symbol', pd.Grouper(key='datetime', freq='D')],
).agg({'low': 'idxmin', 'high': 'idxmax'})

summary = pd.DataFrame(index=idx.index)
summary[['low', 'low_time']] = df[['low', 'datetime']].iloc[idx['low']].values
summary[['high', 'high_time']] = df[['high', 'datetime']].iloc[idx['high']].values
print(summary.T)
like image 192
Reinderien Avatar answered Nov 01 '25 15:11

Reinderien


This is another way you can achieve this:

max_times = df.loc[df['high'].isin(max_values), 'hour']
max_times = max_times[~max_times.index.duplicated(keep="first")]
df.loc[mask, 'day_high_time'] = max_times.values

and also, for more efficiency change your above code to this:

df["hour"] = df["hour"].dt.strftime('%H:%M:%S')
mask = df['hour'].eq('08:30:00')

max_values = df.groupby(['Symbol','Date'])['high'].max()
df.loc[mask, 'day_high'] = max_values

min_values = df.groupby(['Symbol','Date'])['low'].min()
df.loc[mask, 'day_low'] = min_values
  1. Instead of formatting the 'hour' column every time we want to use it, we can assign 'hour' to a formatted version of hour

  2. Instead of just assigning df.loc[mask, 'day_high'] and 'day_low' to the max and min values, we first store them in variables so we can use it later in our code.

Explanation:

max_times = df.loc[df['high'].isin(max_values), 'hour']

We are setting a new variable called max_times to only the 'hour' values where df['high'] is in the max_values variable we created earlier which has the max price for each symbol and date. We have to use isin since max_values and df['high'] have different lengths.

max_times = max_times[~max_times.index.duplicated(keep="first")]

This line is only needed when two different hours have the same price since like in the second symbol and date there is two 0.1 's and two 0.03's.

This line only keeps the first hour if two hours are in the same date and symbol, meaning if 8:30 and 8:31 where there for the same symbol, it would only keep one of them. This is needed so we can assign these values to the data frame.

df.loc[mask, 'day_high_time'] = max_times.values

This is the exact same logic as in your day_high and day_low assignment, but since max_times is actually a series, we need to assign the values of max_times to the 'day_high_time' column.

For the low times, this is the code:

min_times = df.loc[df['low'].isin(min_values), 'hour']
min_times = min_times[~min_times.index.duplicated(keep="first")]
df.loc[mask, 'day_low_time'] = min_times.values

It's the same logic but for the minimum instead of maximum values.

If you want to reduce the code, you can do this instead:

# For Max Times
max_times = df.loc[df['high'].isin(max_values), 'hour']
df.loc[mask, 'day_high_time'] = max_times[~max_times.index.duplicated(keep="first")].values

# For Min Times
min_times = df.loc[df['low'].isin(min_values), 'hour']
df.loc[mask, 'day_low_time'] = min_times[~min_times.index.duplicated(keep="first")].values

You also asked what NaT values were, these stand for Not a Time and are NaN (Not a Number) values but for times.

like image 26
Aadvik Avatar answered Nov 01 '25 17:11

Aadvik