I'm using this line of code to determine values in a column of a large dataframe, df, that are close to the value of A (within a tolerance):
df[df[['column']].apply(numpy.isclose, b=A, atol=0.004).any(1)]
However, in some instances, A can have multiple values (i.e. 2-4 different values). Is there a method that would allow me to loop through each of the values of A to test each one? I know that the line of code shown above will only let me use A if it has one value assigned to it.
An example of this would be (using a much shorter dataframe):
column1 column2
0 0.902062 5.8
1 0.557808 3.3
2 0.655985 3.9
3 0.832471 4.1
4 0.199884 1.2
5 0.127254 1.8
6 0.771439 4.9
7 0.432289 2.8
8 0.385282 2.2
9 0.783643 3.7
Where A has values:
A=[0.432, 0.783, 0.902]
But in another example, it may have the values:
A=[0.558, 0.002]
(where it is obvious here that nothing from the dataframe will actually match 0.002).
I'd like some code which would be able to return the rows from the dataframe where the column1 values match the values of A for ALL examples, regardless of the number of different A values (and if there is no match, return "NaN" instead).
I believe you need repeat columns values by numpy.broadcast_to
before using numpy.isclose
:
np.random.seed(142)
df = pd.DataFrame({'column':np.random.rand(10)})
print (df)
column
0 0.902062
1 0.557808
2 0.655985
3 0.832471
4 0.199884
5 0.127254
6 0.771439
7 0.432289
8 0.385282
9 0.783643
A = [0.432, 0.783, 0.902]
#repeat by length of number of list A
len_A = len(A)
a = np.broadcast_to(df['column'].values[:, None], (len(df),len_A))
print (a)
[[0.90206152 0.90206152 0.90206152]
[0.55780754 0.55780754 0.55780754]
[0.65598471 0.65598471 0.65598471]
[0.83247141 0.83247141 0.83247141]
[0.19988419 0.19988419 0.19988419]
[0.12725426 0.12725426 0.12725426]
[0.77143911 0.77143911 0.77143911]
[0.43228855 0.43228855 0.43228855]
[0.38528223 0.38528223 0.38528223]
[0.78364337 0.78364337 0.78364337]]
#pandas solution
m = pd.concat([df['column']] * len_A, axis=1)
print (m)
column column column
0 0.902062 0.902062 0.902062
1 0.557808 0.557808 0.557808
2 0.655985 0.655985 0.655985
3 0.832471 0.832471 0.832471
4 0.199884 0.199884 0.199884
5 0.127254 0.127254 0.127254
6 0.771439 0.771439 0.771439
7 0.432289 0.432289 0.432289
8 0.385282 0.385282 0.385282
9 0.783643 0.783643 0.783643
m = np.isclose(a, b=A, atol=0.004)
print (m)
[[False False True]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[ True False False]
[False False False]
[False True False]]
Last get all values with True
per rows by any
:
print (m.any(axis=1))
[ True False False False False False False True False True]
And last filter by boolean indexing
:
print (df[m.any(axis=1)])
column
0 0.902062
7 0.432289
9 0.783643
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With