I am using the pandas.DataFrame.dropna method to drop rows that contain NaN. This function returns a dataframe that excludes the dropped rows, as shown in the documentation. How can I store a copy of the dropped rows as a separate dataframe? Is: <pre class="prettyprint"><code>mydataframe[pd.isnull(['list', 'of', 'columns'])] </code></pre> always guaranteed to return the same rows that dropna drops, assuming that dropna is called with <code>subset=['list', 'of', 'columns']</code> ?

I was going to leave a comment, but figured I'd write an answer as it started getting fairly complicated. Start with the following data frame: <pre class="prettyprint"><code>import pandas as pd import numpy as np df = pd.DataFrame([['a', 'b', np.nan], [np.nan, 'c', 'c'], ['c', 'd', 'a']], columns=['col1', 'col2', 'col3']) df col1 col2 col3 0 a b NaN 1 NaN c c 2 c d a </code></pre> And say we want to keep rows with Nans in the columns <code>col2</code> and <code>col3</code> One way to do this is the following: which is based on the answers from this post <pre class="prettyprint"><code>df.loc[pd.isnull(df[['col2', 'col3']]).any(axis=1)] col1 col2 col3 0 a b NaN </code></pre> So this gives us the rows that would be dropped if we dropped rows with Nans in the columns of interest. To keep the columns we can run the same code, but use a <code>~</code> to invert the selection <pre class="prettyprint"><code>df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)] col1 col2 col3 1 NaN c c 2 c d a </code></pre> this is equivalent to: <pre class="prettyprint"><code>df.dropna(subset=['col2', 'col3']) </code></pre> Which we can test: <pre class="prettyprint"><code>df.dropna(subset=['col2', 'col3']).equals(df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)]) True </code></pre> You can of course test this on your own larger dataframes but should get the same answer.

Pandas dropna - store dropped rows

Tags:

I am using the pandas.DataFrame.dropna method to drop rows that contain NaN. This function returns a dataframe that excludes the dropped rows, as shown in the documentation.

How can I store a copy of the dropped rows as a separate dataframe? Is:

mydataframe[pd.isnull(['list', 'of', 'columns'])]

always guaranteed to return the same rows that dropna drops, assuming that dropna is called with subset=['list', 'of', 'columns'] ?

828

asked Dec 15 '15 18:12

wesanyer

2 Answers

You can do this by indexing the original DataFrame by using the unary ~ (invert) operator to give the inverse of the NA free DataFrame.

na_free = df.dropna() only_na = df[~df.index.isin(na_free.index)]

Another option would be to use the ufunc implementation of ~.

only_na = df[np.invert(df.index.isin(na_free.index))]

122

answered Oct 13 '22 15:10

anmol

I was going to leave a comment, but figured I'd write an answer as it started getting fairly complicated. Start with the following data frame:

import pandas as pd import numpy as np df = pd.DataFrame([['a', 'b', np.nan], [np.nan, 'c', 'c'], ['c', 'd', 'a']],               columns=['col1', 'col2', 'col3']) df   col1 col2 col3 0    a    b  NaN 1  NaN    c    c 2    c    d    a

And say we want to keep rows with Nans in the columns col2 and col3 One way to do this is the following: which is based on the answers from this post

df.loc[pd.isnull(df[['col2', 'col3']]).any(axis=1)]    col1 col2 col3 0    a    b  NaN

So this gives us the rows that would be dropped if we dropped rows with Nans in the columns of interest. To keep the columns we can run the same code, but use a ~ to invert the selection

df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)]    col1 col2 col3 1  NaN    c    c 2    c    d    a

this is equivalent to:

df.dropna(subset=['col2', 'col3'])

Which we can test:

df.dropna(subset=['col2', 'col3']).equals(df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)])  True

You can of course test this on your own larger dataframes but should get the same answer.

answered Oct 13 '22 14:10

johnchase

Related questions
                            
                                Zero-reinitializing a struct in C++
                            
                                add build parameter in jenkins build schedule
                            
                                Changing the document title in React?
                            
                                Equivalent of php_value under Apache + php-fpm
                            
                                Swap two numbers golang
                            
                                fatal error: 'Python.h' file not found while installing opencv
                            
                                Intervention / Image Upload Error {{ Image source not readable }}
                            
                                What's the right way to fix this template resolution ambiguity?
                            
                                How to read UTF-8 files with Pandas?
                            
                                Laravel Policies - How to Pass Multiple Arguments to function
                            
                                How to add a project as a reference of another project
                            
                                What is the difference between git rm --cached and git reset <file>?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With