Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find the index at which a given value will be reached/cross by another series?

Suppose I have a timeseries of values called X.

And I now want to know the first index after which the values of some other series Y will be reached by X. Or put differently, for each index i I want to know the first index j after which the line formed by X from j-1 to j intersects the value of Y at i.

Below is an example set of example X, Y series, showing the resulting values for Z. The length of these series is always the same:

X | Y   | Z
2 | 3   | 2
2 | 3   | NaN
4 | 4.5 | 3
5 | 5   | NaN
4 | 5   | NaN
3 | 2   | 6
1 | 2   | NaN

Do pandas or numpy offer something that will assist with this? This function will be run on large datasets so I can't use python loops.

like image 240
quant Avatar asked Apr 19 '19 11:04

quant


1 Answers

Use numpy broadcasting by compare with shifted values, then get indices of first Trues by DataFrame.idxmax with small improvement - added NaN column for get NaN if all False values per row and last remove duplicates values:

a = df['X']
b = df['Y']

a1 = a.values
a2 = a.shift(-1).ffill().values
b1 = b.values[:, None]

arr = (((a1 < b1) & (a2 > b1)) | ((a1 > b1) & (a2 < b1)))
df = pd.DataFrame(arr)
df[np.nan] = True
out = df.idxmax(axis=1) + 1
out = out.mask(out.duplicated())
print (out)
0    2.0
1    NaN
2    3.0
3    NaN
4    NaN
5    6.0
6    NaN
dtype: float64
like image 197
jezrael Avatar answered Sep 27 '22 18:09

jezrael