Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows from a DataFrame based on presence of null value in specific column or columns

Tags:

python

pandas

I have an imported xls file as pandas dataframe, there are two columns containing coordinates which i will use to merge the dataframe with others which have geolocation data. df.info() shows 8859 records, the coordinatess columns have '8835 non-null float64' records.

I want to eyeball the 24 rows (that i assume are null) with all columns records to see if one of the other columns (street address town) can't be used to manually add back the coordinates for those 24 records. Ie. return dataframe for column in df.['Easting'] where isnull or NaN

I have adapted the method given here as below;

df.loc[df['Easting'] == NaN]

But get back an empty dataframe (0 rows × 24 columns), which makes no sense (to me). Attempting to use Null or Non null doesn't work as these values aren't defined. What am i missing?

like image 511
mapping dom Avatar asked Mar 13 '23 12:03

mapping dom


1 Answers

I think you need isnull for checking NaN values with boolean indexing:

df[df['Easting'].isnull()]

Docs:

Warning

One has to be mindful that in python (and numpy), the nan's don’t compare equal, but None's do. Note that Pandas/numpy uses the fact that np.nan != np.nan, and treats None like np.nan.

In [11]: None == None
Out[11]: True

In [12]: np.nan == np.nan
Out[12]: False

So as compared to above, a scalar equality comparison versus a None/np.nan doesn’t provide useful information.

In [13]: df2['one'] == np.nan
Out[13]: 
a    False
b    False
c    False
d    False
e    False
f    False
g    False
h    False
Name: one, dtype: bool
like image 70
jezrael Avatar answered Mar 16 '23 00:03

jezrael