Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Take current row, compare value to X previous rows and return how many matches (within x% range)

I have a pandas column like so:

index colA
1     10.2
2     10.8
3     11.6
4     10.7
5     9.5
6     6.2
7     12.9
8     10.6
9     6.4
10    20.5

I want to search the current row value and find matches from previous rows that are close. For example index4 (10.7) would return a match of 1 because it is close to index2 (10.8). Similarly index8 (10.6) would return a match of 2 because it is close to both index2 and 4.

Using a threshold of +/- 5% for this example would output the below:

index colA  matches
1     10.2    0
2     10.8    0
3     11.6    0
4     10.7    2
5     9.5     0
6     6.2     0
7     12.9    0
8     10.6    3
9     6.4     1
10    20.5    0

With a large dataframe I would like to limit this to the previous X (300?) number of rows to search over rather than an entire dataframe.

like image 972
swifty Avatar asked Dec 13 '22 17:12

swifty


1 Answers

Using triangle indices to ensure we only look backwards. Then use np.bincount to accumulate the matches.

a = df.colA.values

i, j = np.tril_indices(len(a), -1)
mask = np.abs(a[i] - a[j]) / a[i] <= .05
df.assign(matches=np.bincount(i[mask], minlength=len(a)))

       colA  matches
index               
1      10.2        0
2      10.8        0
3      11.6        0
4      10.7        2
5       9.5        0
6       6.2        0
7      12.9        0
8      10.6        3
9       6.4        1
10     20.5        0

If you are having resource issues, consider using good 'ol fashion loops. However, if you have access to numba you make this considerably faster.

from numba import njit

@njit
def counter(a):
    c = np.arange(len(a)) * 0
    for i, x in enumerate(a):
        for j, y in enumerate(a):
            if j < i:
                if abs(x - y) / x <= .05:
                    c[i] += 1
    return c

df.assign(matches=counter(a))

       colA  matches
index               
1      10.2        0
2      10.8        0
3      11.6        0
4      10.7        2
5       9.5        0
6       6.2        0
7      12.9        0
8      10.6        3
9       6.4        1
10     20.5        0
like image 171
piRSquared Avatar answered Dec 29 '22 12:12

piRSquared