I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.
Here is a dataframe that I am working with:
cl_id a c d e A1 A2 A3 0 1 -0.419279 0.843832 -0.530827 text76 1.537177 -0.271042 1 2 0.581566 2.257544 0.440485 dafN_6 0.144228 2.362259 2 3 -1.259333 1.074986 1.834653 system 1.100353 3 4 -1.279785 0.272977 0.197011 Fifty -0.031721 1.434273 4 5 0.578348 0.595515 0.553483 channel 0.640708 0.649132 5 6 -1.549588 -0.198588 0.373476 audio -0.508501 6 7 0.172863 1.874987 1.405923 Twenty NaN NaN 7 8 -0.149630 -0.502117 0.315323 file_max NaN NaN
NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.
If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
shape() method returns the number of rows and number of columns as a tuple, you can use this to check if pandas DataFrame is empty. DataFrame. shape[0] return number of rows. If you have no rows then it gives you 0 and comparing it with 0 gives you True .
Select rows with missing values in a Pandas DataFrame If we want to quickly find rows containing empty values in the entire DataFrame, we will use the DataFrame isna() and isnull() methods, chained with the any() method.
np.where(pd.isnull(df))
returns the row and column indices where the value is NaN:
In [152]: import numpy as np In [153]: import pandas as pd In [154]: np.where(pd.isnull(df)) Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7])) In [155]: df.iloc[2,7] Out[155]: nan In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))] Out[160]: [nan, nan, nan, nan, nan, nan]
Finding values which are empty strings could be done with applymap:
In [182]: np.where(df.applymap(lambda x: x == '')) Out[182]: (array([5]), array([7]))
Note that using applymap
requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With