Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

I have this DataFrame and want only the records whose EPS column is not NaN:

>>> df                  STK_ID  EPS  cash STK_ID RPT_Date                    601166 20111231  601166  NaN   NaN 600036 20111231  600036  NaN    12 600016 20111231  600016  4.3   NaN 601009 20111231  601009  NaN   NaN 601939 20111231  601939  2.5   NaN 000001 20111231  000001  NaN   NaN 

...i.e. something like df.drop(....) to get this resulting dataframe:

                  STK_ID  EPS  cash STK_ID RPT_Date                    600016 20111231  600016  4.3   NaN 601939 20111231  601939  2.5   NaN 

How do I do that?

like image 605
bigbug Avatar asked Nov 16 '12 09:11

bigbug


People also ask

How do I remove rows containing NaN?

To drop all the rows with the NaN values, you may use df. dropna().

How do I delete rows in pandas DataFrame based on condition?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How do I drop NaN values from columns?

Pandas DataFrame dropna() function is used to remove rows and columns with Null/NaN values. By default, this function returns a new DataFrame and the source DataFrame remains unchanged. We can create null values using None, pandas. NaT, and numpy.

How do you drop rows with certain values?

One of the fastest ways to delete rows that contain a specific value or fulfill a given condition is to filter these. Once you have the filtered data, you can delete all these rows (while the remaining rows remain intact).


2 Answers

Don't drop, just take the rows where EPS is not NA:

df = df[df['EPS'].notna()] 
like image 113
eumiro Avatar answered Oct 24 '22 11:10

eumiro


This question is already resolved, but...

...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.

In [24]: df = pd.DataFrame(np.random.randn(10,3))  In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;  In [26]: df Out[26]:           0         1         2 0       NaN       NaN       NaN 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 4       NaN       NaN  0.050742 5 -1.250970  0.030561 -2.678622 6       NaN  1.036043       NaN 7  0.049896 -0.308003  0.823295 8       NaN       NaN  0.637482 9 -0.310130  0.078891       NaN 

In [27]: df.dropna()     #drop all rows that have any NaN values Out[27]:           0         1         2 1  2.677677 -1.466923 -0.750366 5 -1.250970  0.030561 -2.678622 7  0.049896 -0.308003  0.823295 

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN Out[28]:           0         1         2 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 4       NaN       NaN  0.050742 5 -1.250970  0.030561 -2.678622 6       NaN  1.036043       NaN 7  0.049896 -0.308003  0.823295 8       NaN       NaN  0.637482 9 -0.310130  0.078891       NaN 

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN Out[29]:           0         1         2 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 5 -1.250970  0.030561 -2.678622 7  0.049896 -0.308003  0.823295 9 -0.310130  0.078891       NaN 

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question) Out[30]:           0         1         2 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 5 -1.250970  0.030561 -2.678622 6       NaN  1.036043       NaN 7  0.049896 -0.308003  0.823295 9 -0.310130  0.078891       NaN 

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

like image 35
Aman Avatar answered Oct 24 '22 11:10

Aman