I want to search for a target value in a pandas dataframe column only in forward direction and if a bigger value found then I want to record the index difference as a result column. I have managed to do this with two inner for loops but it was painfully slow.
This is what I want to achieve in a simplified example.
import pandas as pd
d = {
'Value' : [8,9,10,12,16,13,11,7,12,18],
'Target' : [12,12,11,15,19,11,16,11,17,18]
}
df = pd.DataFrame(data=d)
>>> df
Target Value
0 12 8
1 12 9
2 11 10
3 15 12
4 19 16
5 11 13
6 16 11
7 11 7
8 17 12
9 18 18
Our first value is 8 and our target value for this is 12. We look forward in Value column for a value which surpass this target value. And we find it in row-4 with value 16. What I want to record is index difference which is 4-0=4.
Next value is 9, again target value is 12. We look forward in values and find row-4 again with value 16.Now index difference is 4-1=3
Lets jump to row-4. We start to looking for the target value starting from index 5 and forward. If there is no value found then result is 0.
This is the result column that I want to reach.
Target Value Result
0 12 8 4
1 12 9 3
2 11 10 1
3 15 12 1
4 19 16 0
5 11 13 3
6 16 11 3
7 11 7 1
8 17 12 1
9 18 18 0
Can this be done without for loops?
Use numpy broadcasting for compare, set numpy upper triangular matrix to False
, get first True
indices by numpy.argmax
, subtract by arange
and set to 0
all negatives:
t = df['Target'].values[:, None]
v = df['Value'].values
m = v > t
m[np.tril_indices(m.shape[1])] = False
print (m)
[[False False False False True True False False False True]
[False False False False True True False False False True]
[False False False True True True False False True True]
[False False False False True False False False False True]
[False False False False False False False False False False]
[False False False False False False False False True True]
[False False False False False False False False False True]
[False False False False False False False False True True]
[False False False False False False False False False True]
[False False False False False False False False False False]]
a = np.argmax(m, axis=1) - np.arange(len(df))
print (a)
[ 4 3 1 1 -4 3 3 1 1 -9]
df['new'] = np.where(a > 0, a, 0)
print (df)
Value Target new
0 8 12 4
1 9 12 3
2 10 11 1
3 12 15 1
4 16 19 0
5 13 11 3
6 11 16 3
7 7 11 1
8 12 17 1
9 18 18 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With