I have a dataframe with around 60 columns and 2 million rows. Some of the columns are mostly empty. I calculated the % of null values in each column using this function. <pre class="prettyprint"><code>def missing_values_table(df): mis_val = df.isnull().sum() mis_val_percent = 100 * df.isnull().sum()/len(df) mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1) mis_val_table_ren_columns = mis_val_table.rename( columns = {0 : 'Missing Values', 1 : '% of Total Values'}) return mis_val_table_ren_columns </code></pre> Now I want to drop the columns that have more than 80%(for example) values missing. I tried the following code but it does not seem to be working. <pre class="prettyprint"><code>df = df.drop(df.columns[df.apply(lambda col: col.isnull().sum()/len(df) > 0.80)], axis=1) </code></pre> Thank you in advance. Hope I'm not missing something very basic I receive this error <blockquote> TypeError: ("'generator' object is not callable", u'occurred at index Unique_Key') </blockquote>

You can use dropna() with threshold parameter <pre class="prettyprint"><code>thresh = len(df) * .2 df.dropna(thresh = thresh, axis = 1, inplace = True) </code></pre>

Drop columns in a pandas dataframe based on the % of null values

Tags:

python

pandas

I have a dataframe with around 60 columns and 2 million rows. Some of the columns are mostly empty. I calculated the % of null values in each column using this function.

def missing_values_table(df): 
    mis_val = df.isnull().sum()
    mis_val_percent = 100 * df.isnull().sum()/len(df)
    mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
    mis_val_table_ren_columns = mis_val_table.rename(
    columns = {0 : 'Missing Values', 1 : '% of Total Values'})
    return mis_val_table_ren_columns

Now I want to drop the columns that have more than 80%(for example) values missing. I tried the following code but it does not seem to be working.

df = df.drop(df.columns[df.apply(lambda col: col.isnull().sum()/len(df) > 0.80)], axis=1)

Thank you in advance. Hope I'm not missing something very basic

I receive this error

TypeError: ("'generator' object is not callable", u'occurred at index Unique_Key')

379

asked Oct 25 '17 18:10

user2656075

2 Answers

You can use dropna() with threshold parameter

thresh = len(df) * .2
df.dropna(thresh = thresh, axis = 1, inplace = True)

179

answered Nov 02 '22 19:11

Vaishali

def missing_values(df, percentage):

    columns = df.columns
    percent_missing = df.isnull().sum() * 100 / len(df)
    missing_value_df = pd.DataFrame({'column_name': columns,
                                 'percent_missing': percent_missing})

    missing_drop = list(missing_value_df[missing_value_df.percent_missing>percentage].column_name)
    df = df.drop(missing_drop, axis=1)
    return df

answered Nov 02 '22 20:11

Frederico Guerra

Related questions
                            
                                SSL3 Certificate Verify Failed when Connecting to JIRA API Using Python
                            
                                Centered text in matplotlib tables
                            
                                How to search through dictionaries?
                            
                                Checking if string is only letters and spaces - Python
                            
                                Operate on a list in a pythonic way when output depends on other elements
                            
                                Python: str.split() - is it possible to only specify the "limit" parameter?
                            
                                AttributeError: 'list' object has no attribute 'replace' when trying to remove character
                            
                                Pandas groupby two columns then get dict for values
                            
                                Running Python script via systemd fails to load module
                            
                                Python Error on Google Cloud Install. How do I properly set the environment variable?
                            
                                can not convert column type from object to str in python dataframe
                            
                                Can't open Jupyter notebook with Anaconda
                            
                                Change static folder from config in Flask
                            
                                Virtualenv not compatible with this system or executable
                            
                                Reading settings in spider scrapy
                            
                                Python: convert datedelta to int value of time difference
                            
                                'numpy.float64' object has no attribute 'translate' Inserting value to Mysql in Python
                            
                                Python: Find count of the elements of one list in another list
                            
                                How to activate virtual environment from Windows 10 command prompt?
                            
                                how to replace multiple values with one value python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With