Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sub-select rows for equality with float dtype using pandas

I have the following dataset example.

df_csv_y =  pd.read_csv('y_factors.csv')

                 time    value
0       736527.481944  27.20001
1       736527.482639  27.10001
2       736527.483333  27.10001
3       736527.484028  27.10001
4       736527.484722  27.10001
              ......

And I tried the index using the code below.

df_csv_y[df_csv_y.time== 736527.482639]

I indexed the values that existed in the dataset, but the results were as follows.

Empty DataFrame
Columns: [time, value]
Index: []

I get the result when I retrieve an integer, but I can not retrieve the float data like the time column of the dataset above.

i want to know how solve this problem.

like image 212
송준석 Avatar asked Jan 03 '23 07:01

송준석


1 Answers

The issue here is that your real float values have higher precision than the displayed values, you can use np.isclose and set the tolerance to a higher precision than the default to select values that are close enough:

In[165]:
df[np.isclose(df['time'],736527.482639, 0.0000000001)]

Out[165]: 
            time     value
1  736527.482639  27.10001

Your current posted sample data works but your real data has higher precision. You can adjust the atol param to np.isclose to set the absolute tolerance.

The other aspect to this is that comparing float values for equality will tend generally to just not work due to floating point precision so when dealing with floating point values it's better to use something like np.isclose for comparison

like image 113
EdChum Avatar answered Jan 05 '23 07:01

EdChum