Drop rows containing empty cells from a pandas DataFrame

People also ask

How do you drop rows containing missing values?

DataFrame-dropna() function The dropna() function is used to remove missing values. Determine if rows or columns which contain missing values are removed. 0, or 'index' : Drop rows which contain missing values. 1, or 'columns' : Drop columns which contain missing value.

How do you drop blank values in a DataFrame?

In order to drop a null values from a dataframe, we used dropna() function this function drop Rows/Columns of datasets with Null values in different ways. Parameters: axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and 'index' or 'columns' for String.

Pandas will recognise a value as null if it is a np.nan object, which will print as NaN in the DataFrame. Your missing values are probably empty strings, which Pandas doesn't recognise as null. To fix this, you can convert the empty stings (or whatever is in your empty cells) to np.nan objects using replace(), and then call dropna()on your DataFrame to delete rows with null tenants.

To demonstrate, we create a DataFrame with some random values and some empty strings in a Tenants column:

>>> import pandas as pd
>>> import numpy as np
>>> 
>>> df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
>>> df['Tenant'] = np.random.choice(['Babar', 'Rataxes', ''], 10)
>>> print df

          A         B   Tenant
0 -0.588412 -1.179306    Babar
1 -0.008562  0.725239         
2  0.282146  0.421721  Rataxes
3  0.627611 -0.661126    Babar
4  0.805304 -0.834214         
5 -0.514568  1.890647    Babar
6 -1.188436  0.294792  Rataxes
7  1.471766 -0.267807    Babar
8 -1.730745  1.358165  Rataxes
9  0.066946  0.375640

Now we replace any empty strings in the Tenants column with np.nan objects, like so:

>>> df['Tenant'].replace('', np.nan, inplace=True)
>>> print df

          A         B   Tenant
0 -0.588412 -1.179306    Babar
1 -0.008562  0.725239      NaN
2  0.282146  0.421721  Rataxes
3  0.627611 -0.661126    Babar
4  0.805304 -0.834214      NaN
5 -0.514568  1.890647    Babar
6 -1.188436  0.294792  Rataxes
7  1.471766 -0.267807    Babar
8 -1.730745  1.358165  Rataxes
9  0.066946  0.375640      NaN

Now we can drop the null values:

>>> df.dropna(subset=['Tenant'], inplace=True)
>>> print df

          A         B   Tenant
0 -0.588412 -1.179306    Babar
2  0.282146  0.421721  Rataxes
3  0.627611 -0.661126    Babar
5 -0.514568  1.890647    Babar
6 -1.188436  0.294792  Rataxes
7  1.471766 -0.267807    Babar
8 -1.730745  1.358165  Rataxes

Pythonic + Pandorable: `df[df['col'].astype(bool)]`

Empty strings are falsy, which means you can filter on bool values like this:

df = pd.DataFrame({
    'A': range(5),
    'B': ['foo', '', 'bar', '', 'xyz']
})
df
   A    B
0  0  foo
1  1     
2  2  bar
3  3     
4  4  xyz

df['B'].astype(bool)                                                                                                                      
0     True
1    False
2     True
3    False
4     True
Name: B, dtype: bool

df[df['B'].astype(bool)]                                                                                                                  
   A    B
0  0  foo
2  2  bar
4  4  xyz

If your goal is to remove not only empty strings, but also strings only containing whitespace, use str.strip beforehand:

df[df['B'].str.strip().astype(bool)]
   A    B
0  0  foo
2  2  bar
4  4  xyz

Faster than you Think

.astype is a vectorised operation, this is faster than every option presented thus far. At least, from my tests. YMMV.

Here is a timing comparison, I've thrown in some other methods I could think of.

enter image description here

Benchmarking code, for reference:

import pandas as pd
import perfplot

df1 = pd.DataFrame({
    'A': range(5),
    'B': ['foo', '', 'bar', '', 'xyz']
})

perfplot.show(
    setup=lambda n: pd.concat([df1] * n, ignore_index=True),
    kernels=[
        lambda df: df[df['B'].astype(bool)],
        lambda df: df[df['B'] != ''],
        lambda df: df[df['B'].replace('', np.nan).notna()],  # optimized 1-col
        lambda df: df.replace({'B': {'': np.nan}}).dropna(subset=['B']),  
    ],
    labels=['astype', "!= ''", "replace + notna", "replace + dropna", ],
    n_range=[2**k for k in range(1, 15)],
    xlabel='N',
    logx=True,
    logy=True,
    equality_check=pd.DataFrame.equals)

value_counts omits NaN by default so you're most likely dealing with "".

So you can just filter them out like

filter = df["Tenant"] != ""
dfNew = df[filter]

There's a situation where the cell has white space, you can't see it, use

df['col'].replace('  ', np.nan, inplace=True)

to replace white space as NaN, then

df= df.dropna(subset=['col'])

You can use this variation:

import pandas as pd
vals = {
    'name' : ['n1', 'n2', 'n3', 'n4', 'n5', 'n6', 'n7'],
    'gender' : ['m', 'f', 'f', 'f',  'f', 'c', 'c'],
    'age' : [39, 12, 27, 13, 36, 29, 10],
    'education' : ['ma', None, 'school', None, 'ba', None, None]
}
df_vals = pd.DataFrame(vals) #converting dict to dataframe

This will output(** - highlighting only desired rows):

   age education gender name
0   39        ma      m   n1 **
1   12      None      f   n2    
2   27    school      f   n3 **
3   13      None      f   n4
4   36        ba      f   n5 **
5   29      None      c   n6
6   10      None      c   n7

So to drop everything that does not have an 'education' value, use the code below:

df_vals = df_vals[~df_vals['education'].isnull()]

('~' indicating NOT)

Result:

   age education gender name
0   39        ma      m   n1
2   27    school      f   n3
4   36        ba      f   n5

Related questions
                            
                                List comprehension rebinds names even after scope of comprehension. Is this right?
                            
                                Getting SyntaxError for print with keyword argument end=' '
                            
                                How is __eq__ handled in Python and in what order?
                            
                                How to surround selected text in PyCharm like with Sublime Text
                            
                                How to pip or easy_install tkinter on Windows
                            
                                Render HTML to PDF in Django site
                            
                                logging.info doesn't show up on console but warn and error do
                            
                                Tool to convert Python code to be PEP8 compliant
                            
                                print memory address of Python variable [duplicate]
                            
                                Get enumeration name by value [duplicate]
                            
                                How exactly does the python any() function work?
                            
                                Importing an ipynb file from another ipynb file?
                            
                                How to change the color of the axis, ticks and labels for a plot in matplotlib
                            
                                Matplotlib - How to plot a high resolution graph?
                            
                                Pairwise crossproduct in Python [duplicate]
                            
                                How do I calculate the MD5 checksum of a file in Python? [duplicate]
                            
                                How to avoid HTTP error 429 (Too Many Requests) python
                            
                                How to change a dataframe column from String type to Double type in PySpark?
                            
                                How to convert string to Title Case in Python?
                            
                                Pip - Fatal error in launcher: Unable to create process using '"'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Drop rows containing empty cells from a pandas DataFrame

Tags:

python

pandas

dataframe

drop

People also ask

Pythonic + Pandorable: `df[df['col'].astype(bool)]`

Faster than you Think

Recent Activity

Donate For Us

Drop rows containing empty cells from a pandas DataFrame

Tags:

python

pandas

dataframe

drop

People also ask

Pythonic + Pandorable: df[df['col'].astype(bool)]

Faster than you Think

Related questions

Recent Activity

Donate For Us

Pythonic + Pandorable: `df[df['col'].astype(bool)]`