Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging two pandas dataframes by interval

I have two pandas dataframes with following format:

df_ts = pd.DataFrame([
        [10, 20, 1,  'id1'],
        [11, 22, 5,  'id1'],
        [20, 54, 5,  'id2'],
        [22, 53, 7,  'id2'],
        [15, 24, 8,  'id1'],
        [16, 25, 10, 'id1']
    ], columns = ['x', 'y', 'ts', 'id'])


df_statechange = pd.DataFrame([
        ['id1', 2, 'ok'],
        ['id2', 4, 'not ok'],
        ['id1', 9, 'not ok']
    ], columns = ['id', 'ts', 'state'])

I am trying to get it to the format, such as:

df_out = pd.DataFrame([
        [10, 20, 1,  'id1', None    ],
        [11, 22, 5,  'id1', 'ok'    ],
        [20, 54, 5,  'id2', 'not ok'],
        [22, 53, 7,  'id2', 'not ok'],
        [15, 24, 8,  'id1', 'ok'    ],
        [16, 25, 10, 'id1', 'not ok']
    ], columns = ['x', 'y', 'ts', 'id', 'state'])

I understand how to accomplish it iteratively by grouping by id and then iterating through each row and changing status when it appears. Is there a pandas build-in more scalable way of doing this?

like image 489
ymoiseev Avatar asked Mar 10 '26 22:03

ymoiseev


1 Answers

Unfortunately pandas merge support only equality joins. See more details at the following thread: merge pandas dataframes where one value is between two others if you want to merge by interval you'll need to overcome the issue, for example by adding another filter after the merge:

joined = a.merge(b,on='id')
joined = joined[joined.ts.between(joined.ts1,joined.ts2)]
like image 178
Dimgold Avatar answered Mar 12 '26 11:03

Dimgold