This is an extension to this question, where OP wanted to know how to drop rows where the values in a single column are NaN. I'm wondering how I can drop rows where the values in 2 (or more) columns are both NaN. Using the second answer's created Data Frame: <pre class="prettyprint"><code>In [1]: df = pd.DataFrame(np.random.randn(10,3)) In [2]: df.ix[::2,0] = np.nan; df.ix[::4,1] = np.nan; df.ix[::3,2] = np.nan; In [3]: df Out[3]: 0 1 2 0 NaN NaN NaN 1 2.677677 -1.466923 -0.750366 2 NaN 0.798002 -0.906038 3 0.672201 0.964789 NaN 4 NaN NaN 0.050742 5 -1.250970 0.030561 -2.678622 6 NaN 1.036043 NaN 7 0.049896 -0.308003 0.823295 8 NaN NaN 0.637482 9 -0.310130 0.078891 NaN </code></pre> If I use the <code>drop.na()</code> command, specifically the <code>drop.na(subset=[1,2])</code>, then it completes an "or" type drop and leaves: <pre class="prettyprint"><code>In[4]: df.dropna(subset=[1,2]) Out[4]: 0 1 2 1 2.677677 -1.466923 -0.750366 2 NaN 0.798002 -0.906038 5 -1.250970 0.030561 -2.678622 7 0.049896 -0.308003 0.823295 </code></pre> What I want is an "and" type drop, where it drops rows where there is an <code>NaN</code> in column index 1 and 2. This would leave: <pre class="prettyprint"><code> 0 1 2 1 2.677677 -1.466923 -0.750366 2 NaN 0.798002 -0.906038 3 0.672201 0.964789 NaN 4 NaN NaN 0.050742 5 -1.250970 0.030561 -2.678622 6 NaN 1.036043 NaN 7 0.049896 -0.308003 0.823295 8 NaN NaN 0.637482 9 -0.310130 0.078891 NaN </code></pre> where only the first row is dropped. Any ideas? EDIT: changed data frame values for consistency

Any one of the following two: <pre class="prettyprint"><code>df.dropna(subset=[1, 2], how='all') </code></pre> or <pre class="prettyprint"><code>df.dropna(subset=[1, 2], thresh=1) </code></pre>

Python - Drop row if two columns are NaN

Tags:

python

pandas

dataframe

This is an extension to this question, where OP wanted to know how to drop rows where the values in a single column are NaN.

I'm wondering how I can drop rows where the values in 2 (or more) columns are both NaN. Using the second answer's created Data Frame:

In [1]: df = pd.DataFrame(np.random.randn(10,3))  In [2]: df.ix[::2,0] = np.nan; df.ix[::4,1] = np.nan; df.ix[::3,2] = np.nan;  In [3]: df Out[3]:           0         1         2 0       NaN       NaN       NaN 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 4       NaN       NaN  0.050742 5 -1.250970  0.030561 -2.678622 6       NaN  1.036043       NaN 7  0.049896 -0.308003  0.823295 8       NaN       NaN  0.637482 9 -0.310130  0.078891       NaN

If I use the drop.na() command, specifically the drop.na(subset=[1,2]), then it completes an "or" type drop and leaves:

In[4]: df.dropna(subset=[1,2]) Out[4]:            0         1         2 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 5 -1.250970  0.030561 -2.678622 7  0.049896 -0.308003  0.823295

What I want is an "and" type drop, where it drops rows where there is an NaN in column index 1 and 2. This would leave:

          0         1         2 1  2.677677 -1.466923 -0.750366 2       NaN  0.798002 -0.906038 3  0.672201  0.964789       NaN 4       NaN       NaN  0.050742 5 -1.250970  0.030561 -2.678622 6       NaN  1.036043       NaN 7  0.049896 -0.308003  0.823295 8       NaN       NaN  0.637482 9 -0.310130  0.078891       NaN

where only the first row is dropped.

Any ideas?

EDIT: changed data frame values for consistency

942

asked Aug 24 '16 16:08

Kevin M

Video Answer

1 Answers

Any one of the following two:

df.dropna(subset=[1, 2], how='all')

df.dropna(subset=[1, 2], thresh=1)

141

answered Oct 04 '22 10:10

A. Garcia-Raboso

Related questions
                            
                                Histogram in matplotlib, time on x-Axis
                            
                                What is the most pythonic way to iterate over OrderedDict
                            
                                Difference between hash() and id()
                            
                                How to rotate X-axis labels in bokeh figure?
                            
                                "pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available"
                            
                                Is it possible to pass arguments into event bindings?
                            
                                How do I access part of a list in Jinja2
                            
                                python yaml.dump bad indentation
                            
                                Why don't I have xlrd?
                            
                                Scipy/Numpy FFT Frequency Analysis
                            
                                Capturing repeating subpatterns in Python regex
                            
                                How to create a commit and push into repo with GitHub API v3?
                            
                                Getting all field names from a protocol buffer?
                            
                                Repeating each element of a numpy array 5 times
                            
                                ValueError: Layer sequential_20 expects 1 inputs, but it received 2 input tensors
                            
                                What is internal representation of string in Python 3.x
                            
                                Get window position & size with python
                            
                                Is it possible to dereference variable id's?
                            
                                Travis special requirements for each python version
                            
                                sqlalchemy: create relations but without foreign key constraint in db?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With