Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you loop and test multiple "numpy.isclose" values on a column of a dataframe?

I'm using this line of code to determine values in a column of a large dataframe, df, that are close to the value of A (within a tolerance):

df[df[['column']].apply(numpy.isclose, b=A, atol=0.004).any(1)]

However, in some instances, A can have multiple values (i.e. 2-4 different values). Is there a method that would allow me to loop through each of the values of A to test each one? I know that the line of code shown above will only let me use A if it has one value assigned to it.

An example of this would be (using a much shorter dataframe):

   column1   column2
0  0.902062    5.8
1  0.557808    3.3
2  0.655985    3.9
3  0.832471    4.1  
4  0.199884    1.2
5  0.127254    1.8
6  0.771439    4.9
7  0.432289    2.8
8  0.385282    2.2
9  0.783643    3.7

Where A has values:

A=[0.432, 0.783, 0.902]

But in another example, it may have the values:

A=[0.558, 0.002]

(where it is obvious here that nothing from the dataframe will actually match 0.002).

I'd like some code which would be able to return the rows from the dataframe where the column1 values match the values of A for ALL examples, regardless of the number of different A values (and if there is no match, return "NaN" instead).

like image 324
Neko Avatar asked Mar 05 '23 07:03

Neko


1 Answers

I believe you need repeat columns values by numpy.broadcast_to before using numpy.isclose:

np.random.seed(142)

df = pd.DataFrame({'column':np.random.rand(10)})
print (df)
     column
0  0.902062
1  0.557808
2  0.655985
3  0.832471
4  0.199884
5  0.127254
6  0.771439
7  0.432289
8  0.385282
9  0.783643

A = [0.432, 0.783, 0.902]

#repeat by length of number of list A
len_A = len(A)
a = np.broadcast_to(df['column'].values[:, None], (len(df),len_A))
print (a)
[[0.90206152 0.90206152 0.90206152]
 [0.55780754 0.55780754 0.55780754]
 [0.65598471 0.65598471 0.65598471]
 [0.83247141 0.83247141 0.83247141]
 [0.19988419 0.19988419 0.19988419]
 [0.12725426 0.12725426 0.12725426]
 [0.77143911 0.77143911 0.77143911]
 [0.43228855 0.43228855 0.43228855]
 [0.38528223 0.38528223 0.38528223]
 [0.78364337 0.78364337 0.78364337]]

#pandas solution
m = pd.concat([df['column']] * len_A, axis=1)
print (m)
     column    column    column
0  0.902062  0.902062  0.902062
1  0.557808  0.557808  0.557808
2  0.655985  0.655985  0.655985
3  0.832471  0.832471  0.832471
4  0.199884  0.199884  0.199884
5  0.127254  0.127254  0.127254
6  0.771439  0.771439  0.771439
7  0.432289  0.432289  0.432289
8  0.385282  0.385282  0.385282
9  0.783643  0.783643  0.783643

m = np.isclose(a, b=A, atol=0.004)
print (m)
[[False False  True]
 [False False False]
 [False False False]
 [False False False]
 [False False False]
 [False False False]
 [False False False]
 [ True False False]
 [False False False]
 [False  True False]]

Last get all values with True per rows by any:

print (m.any(axis=1))
[ True False False False False False False  True False  True]

And last filter by boolean indexing:

print (df[m.any(axis=1)])
     column
0  0.902062
7  0.432289
9  0.783643
like image 129
jezrael Avatar answered May 20 '23 19:05

jezrael