Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List column names that are NULL/Empty for a Dataframe in each row

I have a dataframe with null/empty values in it.
I can easily get the count for each row of the null values by doing this:

df['NULL_COUNT'] = len(df[fields] - df.count(axis=1)

Which will put the number of columns that are NULL in the field NULL_COUNT.

Is there a way to write the column headers the same way to another field if it is null?

df['NULL_FIELD_NAMES'] = "<some query expression>"

Example:

df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)], columns=['A', 'B', 'C'])

In the df above, the 2nd row should have df['NULL_FIELD_NAME'] = 'B' and 3rd row should have df['NULL_FIELD_NAME'] = 'C'

like image 350
code base 5000 Avatar asked Mar 20 '17 12:03

code base 5000


People also ask

How do you check which columns have null values in pandas?

In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.

How do I check if a column is empty in a data frame?

shape() method returns the number of rows and number of columns as a tuple, you can use this to check if pandas DataFrame is empty. DataFrame. shape[0] return number of rows. If you have no rows then it gives you 0 and comparing it with 0 gives you True .

How do you find empty rows in Python?

If we want to quickly find rows containing empty values in the entire DataFrame, we will use the DataFrame isna() and isnull() methods, chained with the any() method.


1 Answers

You can use:

df['new'] = (df.isnull() * df.columns.to_series()).apply(','.join,axis=1).str.strip(',')

Another solution:

df['new'] = df.apply(lambda x: ','.join(x[x.isnull()].index),axis=1)

Sample:

df = pd.DataFrame([range(3), [np.NaN, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)], 
                  columns=['A', 'B', 'C'])
print (df)
     A    B    C
0  0.0  1.0  2.0
1  NaN  NaN  0.0
2  0.0  0.0  NaN
3  0.0  1.0  2.0
4  0.0  1.0  2.0

df['new'] = df.apply(lambda x: ','.join(x[x.isnull()].index),axis=1)
print (df)
     A    B    C  new
0  0.0  1.0  2.0     
1  NaN  NaN  0.0  A,B
2  0.0  0.0  NaN    C
3  0.0  1.0  2.0     
4  0.0  1.0  2.0     
like image 140
jezrael Avatar answered Oct 16 '22 10:10

jezrael