Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas find last value before a given timestamp

Tags:

python

pandas

For the dataframe below, I am trying to add a column to each row that captures the ask_size at different time intervals, for the sake of example, say 1 millisecond.

So for instance, for row 1, the size 1ms before should be 165 since that is the prevailing ask size 1ms before - even though the previous timestamp (2024-02-12 09:00:00.178941829) was way before, it is still the **prevailing ** size 1 millisecond before.

For another example, row 3 to 8 should be all 203, since that is the size at timestamp 2024-02-12 09:00:00.334723166, which would be the last timestamp 1ms before row 3 to 8.

Been reading up on merge_asof, tried a few things below, but no luck. Any help appreciated!

Table example

idx event_timestamp                 ask_size
0   2024-02-12 09:00:00.178941829   165
1   2024-02-12 09:00:00.334673928   166
2   2024-02-12 09:00:00.334723166   203
3   2024-02-12 09:00:00.339505589   203
4   2024-02-12 09:00:00.339517572   241
5   2024-02-12 09:00:00.339585194   276
6   2024-02-12 09:00:00.339597200   276
7   2024-02-12 09:00:00.339679756   277
8   2024-02-12 09:00:00.339705796   312
9   2024-02-12 09:00:00.343967540   275
10  2024-02-12 09:00:00.393306026   275

Raw DATA

data = {
    'event_timestamp': ['2024-02-12 09:00:00.178941829', '2024-02-12 09:00:00.334673928',
                        '2024-02-12 09:00:00.334723166', '2024-02-12 09:00:00.339505589',
                        '2024-02-12 09:00:00.339517572', '2024-02-12 09:00:00.339585194',
                        '2024-02-12 09:00:00.339597200', '2024-02-12 09:00:00.339679756',
                        '2024-02-12 09:00:00.339705796', '2024-02-12 09:00:00.343967540'],
    'ask_size_1_x': [165.0, 166.0, 203.0, 203.0, 241.0, 276.0, 276.0, 277.0, 312.0, 275.0]
}

df = pd.DataFrame(data)

Attempt

data['1ms'] = data['event_timestamp'] - pd.Timedelta(milliseconds=1)

temp = data[['event_timestamp','ask_size_1']]
temp_time_shift = data[['1ms','ask_size_1']]



temp2 = pd.merge_asof(
            temp,
            temp_time_shift,
            left_on = 'event_timestamp',
            right_on = '1ms',
            direction='backward'
        )

EDIT Suggestion:

import pandas as pd

data = {
    'event_timestamp': [
        '2024-02-12 09:00:00.393306026',
        '2024-02-12 09:00:00.393347792',
        '2024-02-12 09:00:00.393351971',
        '2024-02-12 09:00:00.393355738',
        '2024-02-12 09:00:00.393389724',
        '2024-02-12 09:00:00.542780521',
        '2024-02-12 09:00:00.542841917',
        '2024-02-12 09:00:00.714845055',
        '2024-02-12 09:00:00.714908862',
        '2024-02-12 09:00:00.747016524'
    ],
    'ask_size_1': [275.0, 275.0, 237.0, 237.0, 202.0, 202.0, 202.0, 262.0, 261.0, 263.0]
}

df = pd.DataFrame(data)
df['event_timestamp'] = pd.to_datetime(df['event_timestamp'])  # Convert 'event_timestamp' to datetime format

tolerance = pd.Timedelta('1ms')
df['out'] = pd.merge_asof(df['event_timestamp'].sub(tolerance),
                          df[['event_timestamp', 'ask_size_1']],
                          direction='forward', tolerance=tolerance
                         )['ask_size_1']

The output is the below, you can see row 7 for instance, both the ask_size and out are the same. The out should be the last ask_size at least 1ms before row 7, which would be row 6, with a value of 202.

Looking at it the yellow could technically be NaN since there is no value at a timestamp greater than 1ms before.

    event_timestamp             ask_size_1  out
0   2024-02-12 09:00:00.393306026   275.0   275.0
1   2024-02-12 09:00:00.393347792   275.0   275.0
2   2024-02-12 09:00:00.393351971   237.0   275.0
3   2024-02-12 09:00:00.393355738   237.0   275.0
4   2024-02-12 09:00:00.393389724   202.0   275.0
5   2024-02-12 09:00:00.542780521   202.0   202.0
6   2024-02-12 09:00:00.542841917   202.0   202.0
7   2024-02-12 09:00:00.714845055   262.0   262.0
8   2024-02-12 09:00:00.714908862   261.0   262.0
9   2024-02-12 09:00:00.747016524   263.0   263.0

Expected output: Image

like image 810
IGottaLearnMath Avatar asked Dec 22 '25 02:12

IGottaLearnMath


1 Answers

IIUC you can indeed use a merge_asof. You however need to adapt the parameters to perform the search in the correct order:

delta = pd.Timedelta('1ms')
df['out'] = pd.merge_asof(df['event_timestamp'].sub(delta), df,
                          direction='backward')['ask_size_1']

NB. I'm assuming here that the timestamps are already sorted. If not you need to sort them before running the merge_asof.

Output:

                event_timestamp  ask_size_1    out
0 2024-02-12 09:00:00.393306026       271.0    NaN
1 2024-02-12 09:00:00.393347792       275.0    NaN
2 2024-02-12 09:00:00.393351971       237.0    NaN
3 2024-02-12 09:00:00.393355738       237.0    NaN
4 2024-02-12 09:00:00.393389724       202.0    NaN
5 2024-02-12 09:00:00.542780521       206.0  202.0
6 2024-02-12 09:00:00.542841917        51.0  202.0
7 2024-02-12 09:00:00.714845055       262.0   51.0
8 2024-02-12 09:00:00.714908862       261.0   51.0
9 2024-02-12 09:00:00.747016524       263.0  261.0

If you want to get the 271 for the yellow values, you could adapt is slightly:

tmp = pd.concat([pd.DataFrame({'event_timestamp': [df['event_timestamp'].iloc[0]-delta],
                               'ask_size_1': [df['ask_size_1'].iloc[0]]}),
                 df])

delta = pd.Timedelta('1ms')

df['out'] = pd.merge_asof(df['event_timestamp'].sub(delta), tmp,
                          direction='backward',
                          allow_exact_matches=False)['ask_size_1']

Output:

                event_timestamp  ask_size_1    out
0 2024-02-12 09:00:00.393306026       271.0    NaN
1 2024-02-12 09:00:00.393347792       275.0  271.0
2 2024-02-12 09:00:00.393351971       237.0  271.0
3 2024-02-12 09:00:00.393355738       237.0  271.0
4 2024-02-12 09:00:00.393389724       202.0  271.0
5 2024-02-12 09:00:00.542780521       206.0  202.0
6 2024-02-12 09:00:00.542841917        51.0  202.0
7 2024-02-12 09:00:00.714845055       262.0   51.0
8 2024-02-12 09:00:00.714908862       261.0   51.0
9 2024-02-12 09:00:00.747016524       263.0  261.0
like image 156
mozway Avatar answered Dec 24 '25 18:12

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!