How to drop rows of Pandas DataFrame whose value in a certain column is NaN

Tags:

I have this DataFrame and want only the records whose EPS column is not NaN:

>>> df                  STK_ID  EPS  cash STK_ID RPT_Date                    601166 20111231  601166  NaN   NaN 600036 20111231  600036  NaN    12 600016 20111231  600016  4.3   NaN 601009 20111231  601009  NaN   NaN 601939 20111231  601939  2.5   NaN 000001 20111231  000001  NaN   NaN

...i.e. something like df.drop(....) to get this resulting dataframe:

                  STK_ID  EPS  cash STK_ID RPT_Date                    600016 20111231  600016  4.3   NaN 601939 20111231  601939  2.5   NaN

How do I do that?

605

asked Nov 16 '12 09:11

2 Answers

Don't drop, just take the rows where EPS is not NA:

df = df[df['EPS'].notna()]

113

answered Oct 24 '22 11:10

eumiro

This question is already resolved, but...

...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.

In [24]: df = pd.DataFrame(np.random.randn(10,3))  In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;  In [26]: df Out[26]:           0         1         2 0       NaN       NaN       NaN 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 4       NaN       NaN  0.050742 5 -1.250970  0.030561 -2.678622 6       NaN  1.036043       NaN 7  0.049896 -0.308003  0.823295 8       NaN       NaN  0.637482 9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values Out[27]:           0         1         2 1  2.677677 -1.466923 -0.750366 5 -1.250970  0.030561 -2.678622 7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN Out[28]:           0         1         2 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 4       NaN       NaN  0.050742 5 -1.250970  0.030561 -2.678622 6       NaN  1.036043       NaN 7  0.049896 -0.308003  0.823295 8       NaN       NaN  0.637482 9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN Out[29]:           0         1         2 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 5 -1.250970  0.030561 -2.678622 7  0.049896 -0.308003  0.823295 9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question) Out[30]:           0         1         2 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 5 -1.250970  0.030561 -2.678622 6       NaN  1.036043       NaN 7  0.049896 -0.308003  0.823295 9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

answered Oct 24 '22 11:10

Aman

Related questions
                            
                                Get the data received in a Flask request
                            
                                Checking whether a variable is an integer or not [duplicate]
                            
                                How to overcome "datetime.datetime not JSON serializable"?
                            
                                Find all files in a directory with extension .txt in Python
                            
                                What is the correct cross-platform way to get the home directory in Python?
                            
                                How to catch and print the full exception traceback without halting/exiting the program?
                            
                                Should I put #! (shebang) in Python scripts, and what form should it take?
                            
                                Extract file name from path, no matter what the os/path format
                            
                                What are "named tuples" in Python?
                            
                                How do I install a Python package with a .whl file?
                            
                                What is the difference between old style and new style classes in Python?
                            
                                How to return dictionary keys as a list in Python?
                            
                                Is there a simple way to delete a list element by value?
                            
                                Fastest way to check if a value exists in a list
                            
                                How to iterate through two lists in parallel?
                            
                                What are the most common Python docstring formats? [closed]
                            
                                How to make a class JSON serializable
                            
                                How does Python's super() work with multiple inheritance?
                            
                                What does the 'b' character do in front of a string literal?
                            
                                What's the difference between lists and tuples?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

Tags:

python

pandas

dataframe

nan

bigbug

People also ask

2 Answers

eumiro

Aman

Recent Activity

Donate For Us