I am using the pandas.DataFrame.dropna method to drop rows that contain NaN. This function returns a dataframe that excludes the dropped rows, as shown in the documentation.
How can I store a copy of the dropped rows as a separate dataframe? Is:
mydataframe[pd.isnull(['list', 'of', 'columns'])]
always guaranteed to return the same rows that dropna drops, assuming that dropna is called with subset=['list', 'of', 'columns']
?
dropna() also gives you the option to remove the rows by searching for null or missing values on specified columns. To search for null values in specific columns, pass the column names to the subset parameter.
The dropna() method removes the rows that contains NULL values. The dropna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the dropna() method does the removing in the original DataFrame instead.
It removes rows that have NaN values in the corresponding columns. I will use the same dataframe that was created in Step 2. After removing NaN values from the dataframe you have to finally modify your dataframe. It can be done by passing the inplace =True inside the dropna() method.
You can do this by indexing the original DataFrame by using the unary ~
(invert) operator to give the inverse of the NA free DataFrame.
na_free = df.dropna() only_na = df[~df.index.isin(na_free.index)]
Another option would be to use the ufunc implementation of ~
.
only_na = df[np.invert(df.index.isin(na_free.index))]
I was going to leave a comment, but figured I'd write an answer as it started getting fairly complicated. Start with the following data frame:
import pandas as pd import numpy as np df = pd.DataFrame([['a', 'b', np.nan], [np.nan, 'c', 'c'], ['c', 'd', 'a']], columns=['col1', 'col2', 'col3']) df col1 col2 col3 0 a b NaN 1 NaN c c 2 c d a
And say we want to keep rows with Nans in the columns col2
and col3
One way to do this is the following: which is based on the answers from this post
df.loc[pd.isnull(df[['col2', 'col3']]).any(axis=1)] col1 col2 col3 0 a b NaN
So this gives us the rows that would be dropped if we dropped rows with Nans in the columns of interest. To keep the columns we can run the same code, but use a ~
to invert the selection
df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)] col1 col2 col3 1 NaN c c 2 c d a
this is equivalent to:
df.dropna(subset=['col2', 'col3'])
Which we can test:
df.dropna(subset=['col2', 'col3']).equals(df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)]) True
You can of course test this on your own larger dataframes but should get the same answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With