Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding closest value while grouping by a column

Tags:

python

pandas

I want to create 2 new columns which would give me the closest value and ID to a certain value. This is how my df in python is structured:

x_time    expiration    x_price    p_time    p_price
 100          4          55.321     100        21
 105          4          51.120     105        25
 110          4          44.412     110        33.1
 100          5           9.1       100        3.1
 105          5           9.5       105        5.1
 110          5           8.2       110        12.1 
 100          6           122.1     100        155.9
 105          6           144.1     105        134.2 
 .......

Essentially, I want to create a new column (called 'closest_time' & 'closest_price') which would be the closest p_price to the x_price for that group only (hence the group by expiration)

So, the expected results would look like this:

x_time    expiration    x_price    p_time    p_price   closest_price closest_p_time
 100          4          55.321     100        21           33.1       110
 105          4          51.120     105        25           33.1       110
 110          4          44.412     110        33.1         33.1       110
 100          5           9.1       100        3.1          12.1       110
 105          5           9.5       105        5.1          12.1       110
 110          5           8.2       110        12.1          5.1       105
 100          6           122.1     100        155.9       134.2       105
 105          6           144.1     105        134.2       134.22      100

Hopefully, this somewhat makes sense.

I have thought of potential way to go about doing this:

  1. use for-loops..
    • first loop by expiration
    • then parse through p_price and compare all values to each x_price and select the closest one (min(abs())
    • however, it seems like the longest way of going about it - if there is a way to vectorize this, that would be ideal!

however, I have not been successful.

Thank you!

like image 667
yungpadewon Avatar asked Apr 16 '19 18:04

yungpadewon


1 Answers

I think a nice solution is this:

df['closest_price'] = \
    df.apply(lambda x: df[df.p_price <= x.x_price]['p_price'].max(), axis=1)
like image 135
tvgriek Avatar answered Oct 01 '22 04:10

tvgriek