How to remove columns with too many missing values in Python

Tags:

I'm working on a machine learning problem in which there are many missing values in the features. There are 100's of features and I would like to remove those features that have too many missing values (it can be features with more than 80% missing values). How can I do that in Python?

My data is a Pandas dataframe.

645

asked Aug 04 '17 20:08

HHH

2 Answers

You can use Pandas' dropna().

limitPer = len(yourdf) * .80
yourdf = yourdf.dropna(thresh=limitPer, axis=1)

152

answered Oct 17 '22 08:10

singmotor

To generalize within Pandas you can do the following to calculate the percent of values in a column with missing values. From those columns you can filter out the features with more than 80% NULL values and then drop those columns from the DataFrame.

pct_null = df.isnull().sum() / len(df)
missing_features = pct_null[pct_null > 0.80].index
df.drop(missing_features, axis=1, inplace=True)

answered Oct 17 '22 07:10

vielkind

Related questions
                            
                                How can I enumerate/list all installed applications in Windows XP?
                            
                                Popen and python
                            
                                python: combine sort-key-functions itemgetter and str.lower
                            
                                Plotting points in python
                            
                                Project Euler 5 in Python - How can I optimize my solution?
                            
                                How do I stop tkinter after function?
                            
                                Why does the 'int' object is not callable error occur when using the sum() function? [duplicate]
                            
                                Parsing XML - right scripting languages / packages for the job?
                            
                                convert a string such that the first letter is uppercase and everythingelse is lower case [duplicate]
                            
                                Finding mean of a values in a dictionary without using .values() etc
                            
                                Flask: redirect to same page after form submission
                            
                                How to detect whether two files are identical in Python [duplicate]
                            
                                Python multi-dimensional array initialization without a loop
                            
                                Python Enum class (with tostring fromstring)
                            
                                Python f-string formatting not working with strftime inline
                            
                                Encoding error while parsing RSS with lxml
                            
                                Choosing embedded scripting language for C++
                            
                                Why CIFAR-10 images are not displayed properly using matplotlib?
                            
                                How to convert integers in list to string in python
                            
                                How to hide Chrome Driver in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to remove columns with too many missing values in Python

Tags:

python

pandas

dataframe

missing-data

scikit-learn

HHH

People also ask

2 Answers

singmotor

vielkind

Recent Activity

Donate For Us