Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Pandas find first nan value by rows and return column name

Tags:

python

pandas

I have a dataframe like this

>>df1 = pd.DataFrame({'A': ['1', '2', '3', '4','5'],
              'B': ['1', '1', '1', '1','1'],
              'C': ['c', 'A1', None, 'c3',None],
              'D': ['d0', 'B1', 'B2', None,'B4'],
              'E': ['A', None, 'S', None,'S'],
              'F': ['3', '4', '5', '6','7'],
              'G': ['2', '2', None, '2','2']})
>>df1

   A  B     C     D     E  F     G
0  1  1     c    d0     A  3     2
1  2  1    A1    B1  None  4     2
2  3  1  None    B2     S  5  None
3  4  1    c3  None  None  6     2
4  5  1  None    B4     S  7     2

and I drop the rows which contain nan valuesdf2 = df1.dropna()

   A  B     C     D     E  F     G   
1  2  1    A1    B1  None  4     2
2  3  1  None    B2     S  5  None
3  4  1    c3  None  None  6     2
4  5  1  None    B4     S  7     2

This is a dropped dataframe due to those rows contain nan values. However,I wanna know why they be dropped? Which column is the "first nan value column" made the row been dropped ? I need a dropped reason for report.

the output should be

['E','C','D','C']

I know I can do dropna by each column then record it as the reason but it's really non-efficient.

Is any more efficient way to solve this problem? Thank you

like image

445

asked Oct 12 '16 09:10

user2775128

People also ask

Which method in pandas can be used to list Return the first n rows?

The head() method returns the first n rows. By default, the first 5 rows are returned. You can specify the number of rows.

1 Answers

I think you can create boolean dataframe by DataFrame.isnull, then filter by boolean indexing with mask where are at least one True by any and last idxmax - you get column names of first True values of DataFrame:

booldf = df1.isnull()
print (booldf)
       A      B      C      D      E      F      G
0  False  False  False  False  False  False  False
1  False  False  False  False   True  False  False
2  False  False   True  False  False  False   True
3  False  False  False   True   True  False  False
4  False  False   True  False  False  False  False

print (booldf.any(axis=1))
0    False
1     True
2     True
3     True
4     True
dtype: bool

print (booldf[booldf.any(axis=1)].idxmax(axis=1))
1    E
2    C
3    D
4    C
dtype: object

like image

200

answered Oct 05 '22 23:10

jezrael

Sign in to Comment

Related questions
                            
                                Hough Line Transform identifies only one line even though image contains many lines in OpenCV in Python
                            
                                Running asynchronous queries in BigQuery not noticeably faster
                            
                                Pandas .describe() only returning 4 statistics on int dataframe (count, unique, top, freq)... no min, max, etc
                            
                                BSON object size of document retrieved from DB
                            
                                Decoding ampersand hash strings (&#124&#120&#97)etc
                            
                                Python3 syntax in PyCharm
                            
                                equivalent to R's `do.call` in python
                            
                                Python 2 and Python 3 - Running in Command Prompt
                            
                                How to fetch specific rows from a tensor in Tensorflow?
                            
                                Re-initialize variables in Tensorflow
                            
                                Spacy Pipeline?
                            
                                How to define and use percentage in Pint
                            
                                How do i save many to many fields objects using django rest framework
                            
                                Tkinter - How to stop a loop with a stop button?
                            
                                How to capture website screenshot in high resolution?
                            
                                Pandas dataframe pivot not fitting in memory
                            
                                peewee and peewee-async: why is async slower
                            
                                Python: Open Excel Workbook using Win32 COM Api
                            
                                PyInstaller cannot add .txt files
                            
                                POST request works in Postman but not in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With