I have a dataframe and I need to create another dataframe that will include only records that are missing on a particular column.
'Missing' means it can be empty, 'N/A', 'NODATA' or in some cases other fixed values (I need a function so I can simply pass a list that fits the case).
I can filter it by the values, so the N/As and NODATA records are found, but it doesn't see the empty ones.
I tried multiple approaches, is it possible to do that in that way (so an empty value is a part of the list), or do I need to filter empty separately and merge it to the list-ones?
def GetEmpty(df, on, items_list=['']):
    column = df[on]
    boolean_series = column.isin(items_list)
    empty_df = df[boolean_series]
    return empty_df
empty_df = GetEmpty(df, 'Isin', ['','N/A', 'NODATA'])
Edit: the dataframe:
 System Book    Isin ID
0    ABC  DEF     XYZ  A
1    ABC  DEF     NaN  B
2    ABC  DEF     ABC  C
3    ABC  DEF  NODATA  D
If I understand correctly, you're looking to check if it's empty (ie: Python's None or Numpy.NaN) AND check if it's one of these special values - the empty string (""), "N/A", or "NODATA".
Pandas has a build-in function called isnull() which checks for None or np.NaN.
So, you could do something like this...
import pandas as pd
def GetEmpty(df, on, items_list=['']):
        boolean_series = df.apply(lambda x: True if ((pd.isnull(x[on])) or (x[on] in items_list)) else False, axis=1)
        empty_df = df[boolean_series]
        return empty_df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With