Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using np.searchsorted to find the most recent timestamp

I have two lists each populated with timestamps, list_a and list_b. What is the best way using np.searchsorted to find the most recent entry in list_a for each entry in list_b? The result would be a list_a_updated where each x in list_a_updated matches straight across to its corresponding (and later) entry in list_b. This question is very similar to this question

pandas.merge: match the nearest time stamp >= the series of timestamps

but a little bit different.

It embarrass me that I cannot just how to reverse this so it grabs the <= timestamp instead of the >= timestamp but I have been working with this for a while and it is less obvious than it seems. My example code is:

#in this code tradelist is list_b, balist is list_a

tradelist=np.array(list(filtereddflist[x][filtereddflist[x].columns[1]]))
df_filt=df_filter(filtereddflist2[x], 2, "BEST_BID" )
balist=np.array(list(df_filt[df_filt.columns[1]]))

idx=np.searchsorted(tradelist,balist)-1
mask= idx <=0

df=pd.DataFrame({"tradelist":tradelist[idx][mask],"balist":balist[mask]})

And the solution is not as simple as just switching the inequality.

If it helps at all I am dealing with trade and bid stock data and am trying to find the most recent bid (list_a) for each trade (list_b) without having to resort to a for loop.

like image 633
sfortney Avatar asked Oct 20 '22 15:10

sfortney


1 Answers

To make our life easier, lets use numbers instead of timestamps:

>>> a = np.arange(0, 10, 2)
>>> b = np.arange(1, 8, 3)
>>> a
array([0, 2, 4, 6, 8])
>>> b
array([1, 4, 7])

The last timestamps in a that are smaller than or equal to each item in b would be [0, 4, 6], which correspond to indices [0, 2, 3], which is exactly what we get if we do:

>>> np.searchsorted(a, b, side='right') - 1
array([0, 2, 3])
>>> a[np.searchsorted(a, b, side='right') - 1]
array([0, 4, 6])

If you don't use side='right' then you would get wrong values for the second term, where there is an exactly matching timestamp in both arrays:

>>> np.searchsorted(a, b) - 1
array([0, 1, 3])
like image 163
Jaime Avatar answered Oct 22 '22 05:10

Jaime