Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to vectorize pandas dataframe forward column value search

I want to search for a target value in a pandas dataframe column only in forward direction and if a bigger value found then I want to record the index difference as a result column. I have managed to do this with two inner for loops but it was painfully slow.

This is what I want to achieve in a simplified example.

import pandas as pd

d = {
    'Value'  : [8,9,10,12,16,13,11,7,12,18],
    'Target' : [12,12,11,15,19,11,16,11,17,18]
    }
df = pd.DataFrame(data=d)


>>> df

   Target  Value
0      12      8
1      12      9
2      11     10
3      15     12
4      19     16
5      11     13
6      16     11
7      11      7
8      17     12
9      18     18

Our first value is 8 and our target value for this is 12. We look forward in Value column for a value which surpass this target value. And we find it in row-4 with value 16. What I want to record is index difference which is 4-0=4.

Next value is 9, again target value is 12. We look forward in values and find row-4 again with value 16.Now index difference is 4-1=3

Lets jump to row-4. We start to looking for the target value starting from index 5 and forward. If there is no value found then result is 0.

This is the result column that I want to reach.

   Target  Value  Result
0      12      8       4
1      12      9       3
2      11     10       1
3      15     12       1
4      19     16       0
5      11     13       3
6      16     11       3
7      11      7       1
8      17     12       1
9      18     18       0

Can this be done without for loops?

like image 555
akifusenet Avatar asked May 11 '19 09:05

akifusenet


1 Answers

Use numpy broadcasting for compare, set numpy upper triangular matrix to False, get first True indices by numpy.argmax, subtract by arange and set to 0 all negatives:

t = df['Target'].values[:, None]
v = df['Value'].values
m = v > t
m[np.tril_indices(m.shape[1])] = False
print (m)
[[False False False False  True  True False False False  True]
 [False False False False  True  True False False False  True]
 [False False False  True  True  True False False  True  True]
 [False False False False  True False False False False  True]
 [False False False False False False False False False False]
 [False False False False False False False False  True  True]
 [False False False False False False False False False  True]
 [False False False False False False False False  True  True]
 [False False False False False False False False False  True]
 [False False False False False False False False False False]]

a = np.argmax(m, axis=1) - np.arange(len(df))
print (a)
[ 4  3  1  1 -4  3  3  1  1 -9]

df['new'] = np.where(a > 0, a, 0)
print (df)
   Value  Target  new
0      8      12    4
1      9      12    3
2     10      11    1
3     12      15    1
4     16      19    0
5     13      11    3
6     11      16    3
7      7      11    1
8     12      17    1
9     18      18    0
like image 144
jezrael Avatar answered Sep 29 '22 14:09

jezrael