Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find empty or NaN entry in Pandas Dataframe

I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.

Here is a dataframe that I am working with:

cl_id       a           c         d         e        A1              A2             A3     0       1   -0.419279  0.843832 -0.530827    text76        1.537177      -0.271042     1       2    0.581566  2.257544  0.440485    dafN_6        0.144228       2.362259     2       3   -1.259333  1.074986  1.834653    system                       1.100353     3       4   -1.279785  0.272977  0.197011     Fifty       -0.031721       1.434273     4       5    0.578348  0.595515  0.553483   channel        0.640708       0.649132     5       6   -1.549588 -0.198588  0.373476     audio       -0.508501                    6       7    0.172863  1.874987  1.405923    Twenty             NaN            NaN     7       8   -0.149630 -0.502117  0.315323  file_max             NaN            NaN 

NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.

If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?

like image 842
edesz Avatar asked Nov 26 '14 21:11

edesz


People also ask

How do you find the blank value of a DataFrame?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

How do you check if a cell is empty in pandas DataFrame?

shape() method returns the number of rows and number of columns as a tuple, you can use this to check if pandas DataFrame is empty. DataFrame. shape[0] return number of rows. If you have no rows then it gives you 0 and comparing it with 0 gives you True .

How do you get blank rows in pandas?

Select rows with missing values in a Pandas DataFrame If we want to quickly find rows containing empty values in the entire DataFrame, we will use the DataFrame isna() and isnull() methods, chained with the any() method.


1 Answers

np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

In [152]: import numpy as np In [153]: import pandas as pd In [154]: np.where(pd.isnull(df)) Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))  In [155]: df.iloc[2,7] Out[155]: nan  In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))] Out[160]: [nan, nan, nan, nan, nan, nan] 

Finding values which are empty strings could be done with applymap:

In [182]: np.where(df.applymap(lambda x: x == '')) Out[182]: (array([5]), array([7])) 

Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

like image 68
unutbu Avatar answered Sep 23 '22 10:09

unutbu