Is it possible to have missing values in scikit-learn ? How should they be represented? I couldn't find any documentation about that.

<del>Missing values are simply not supported in scikit-learn. There has been discussion on the mailing list about this before, but no attempt to actually write code to handle them.</del> <del>Whatever you do, don't use NaN to encode missing values, since many of the algorithms refuse to handle samples containing NaNs.</del> The above answer is outdated; the latest release of scikit-learn has a class <code>Imputer</code> that does simple, per-feature missing value imputation. You can feed it arrays containing NaNs to have those replaced by the mean, median or mode of the corresponding feature.

Missing values in scikits machine learning

1 Answers

~~Missing values are simply not supported in scikit-learn. There has been discussion on the mailing list about this before, but no attempt to actually write code to handle them.~~

~~Whatever you do, don't use NaN to encode missing values, since many of the algorithms refuse to handle samples containing NaNs.~~

The above answer is outdated; the latest release of scikit-learn has a class Imputer that does simple, per-feature missing value imputation. You can feed it arrays containing NaNs to have those replaced by the mean, median or mode of the corresponding feature.

165

answered Oct 02 '22 10:10

Fred Foo

Related questions
                            
                                Faster numpy cartesian to spherical coordinate conversion?
                            
                                How do you find out what the "system default encoding" is?
                            
                                Psycopg2 on Amazon Elastic Beanstalk
                            
                                Drawing lines between two plots in Matplotlib
                            
                                Django Management Command Argument
                            
                                how to convert string into dictionary in python 3.*? [duplicate]
                            
                                Dynamically filtering a pandas dataframe
                            
                                What are some methods to analyze image brightness using Python?
                            
                                awscli getting started error
                            
                                How to check if pandas Series is empty?
                            
                                websocket vs rest API for real time data? [closed]
                            
                                On OS X El Capitan I can not upgrade a python package dependent on the six compatibility utilities NOR can I remove six
                            
                                How to have multiple conditions for one if statement in python [duplicate]
                            
                                Upgrade the Python package dateutil: Could not find a version
                            
                                Solve almostIncreasingSequence (Codefights)
                            
                                BeautifulSoup: Get the contents of a specific table
                            
                                Mercurial error *** failed to import extension hggit: No module named hggit
                            
                                Convert a csv.DictReader object to a list of dictionaries?
                            
                                what is the difference between Flatten() and GlobalAveragePooling2D() in keras
                            
                                Matplotlib runs out of memory when plotting in a loop

Missing values in scikits machine learning

Tags:

python

missing-data

machine-learning

scikit-learn

scikits

Vladtn

People also ask

1 Answers

Fred Foo

Recent Activity

Donate For Us