Suppose I have a timeseries of values called X
.
And I now want to know the first index after which the values of some other series Y
will be reached by X
. Or put differently, for each index i
I want to know the first index j
after which the line formed by X
from j-1
to j
intersects the value of Y
at i
.
Below is an example set of example X
, Y
series, showing the resulting values for Z
. The length of these series is always the same:
X | Y | Z
2 | 3 | 2
2 | 3 | NaN
4 | 4.5 | 3
5 | 5 | NaN
4 | 5 | NaN
3 | 2 | 6
1 | 2 | NaN
Do pandas
or numpy
offer something that will assist with this? This function will be run on large datasets so I can't use python loops.
Use numpy broadcasting
by compare with shifted
values, then get indices of first True
s by DataFrame.idxmax
with small improvement - added NaN
column for get NaN
if all False
values per row and last remove duplicates values:
a = df['X']
b = df['Y']
a1 = a.values
a2 = a.shift(-1).ffill().values
b1 = b.values[:, None]
arr = (((a1 < b1) & (a2 > b1)) | ((a1 > b1) & (a2 < b1)))
df = pd.DataFrame(arr)
df[np.nan] = True
out = df.idxmax(axis=1) + 1
out = out.mask(out.duplicated())
print (out)
0 2.0
1 NaN
2 3.0
3 NaN
4 NaN
5 6.0
6 NaN
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With