Implementation of sklearn.impute.IterativeImputer

Tags:

Consider data which contains some nan below:

Column-1    Column-2    Column-3    Column-4    Column-5
0   NaN 15.0    63.0    8.0 40.0
1   60.0    51.0    NaN 54.0    31.0
2   15.0    17.0    55.0    80.0    NaN
3   54.0    43.0    70.0    16.0    73.0
4   94.0    31.0    94.0    29.0    53.0
5   99.0    52.0    77.0    91.0    58.0
6   84.0    19.0    36.0    NaN 97.0
7   41.0    91.0    62.0    67.0    68.0
8   44.0    38.0    27.0    53.0    37.0
9   58.0    NaN 63.0    57.0    28.0
10  66.0    68.0    89.0    36.0    47.0
11  7.0 81.0    5.0 99.0    16.0
12  43.0    55.0    64.0    88.0    NaN
13  8.0 90.0    91.0    44.0    4.0
14  29.0    52.0    94.0    71.0    47.0
15  22.0    21.0    68.0    61.0    38.0
16  76.0    36.0    70.0    99.0    50.0
17  38.0    31.0    66.0    79.0    99.0
18  94.0    22.0    92.0    39.0    58.0

I want to replace nan in the data using sklearn.impute.IterativeImputer. A friend helped me with the code below:

imp = IterativeImputer(missing_values=np.nan, sample_posterior=False, 
                                 max_iter=10, tol=0.001, 
                                 n_nearest_features=4, initial_strategy='median')
imp.fit(data)
imputed_data = pd.DataFrame(data=imp.transform(data), 
                             columns=['Column-1', 'Column-2', 'Column-3', 'Column-4', 'Column-5'],
                             dtype='int')

The imputed_data is:


Column-1    Column-2    Column-3    Column-4    Column-5
0   59  15  63  8   40
1   60  51  66  54  31
2   15  17  55  80  48
3   54  43  70  16  73
4   94  31  94  29  53
5   99  52  77  91  58
6   84  19  36  59  97
7   41  91  62  67  68
8   44  38  27  53  37
9   58  46  63  57  28
10  66  68  89  36  47
11  7   81  5   99  16
12  43  55  64  88  47
13  8   90  91  44  4
14  29  52  94  71  47
15  22  21  68  61  38
16  76  36  70  99  50
17  38  31  66  79  99
18  94  22  92  39  58

From the IterativeImputer documentation, the default estimator is BayesianRidge(). But if I use other estimators such as estimator=ExtraTreesRegressor(n_estimators=10, random_state=0) like in the code below, it returns a warning message. The code:

imp = IterativeImputer(estimator=ExtraTreesRegressor(n_estimators=10, random_state=0), missing_values=np.nan, sample_posterior=False, 
                                 max_iter=10, tol=0.001, 
                                 n_nearest_features=4, initial_strategy='median')
imp.fit(data)

The message:

C:\Users\...\sklearn\impute\_iterative.py:599: ConvergenceWarning: [IterativeImputer] Early stopping criterion not reached. " reached.", ConvergenceWarning).

My question: is this a correct approach or should I do something to fix the warning message?
Thank you.

330

asked Jul 22 '19 21:07

k.ko3n

2 Answers

They are having the same issue here:

https://github.com/scikit-learn/scikit-learn/issues/14338

110

answered Oct 17 '22 02:10

mel1

You are getting this error because of the parameters max_iter=10 & tol=0.001set for IterativeImputer().

The stopping criterion (abs(max(X_t - X_{t-1}))/abs(max(X[known_vals])) < tol) is not met for 10 number of iterations(max_iter=10).

Refer to the description of max_iter in the parameters section of sklearn.impute.IterativeImputer documentation.

One workaround to overcome this error is setting the max_iter parameter value higher.

answered Oct 17 '22 01:10

akhil penta

Related questions
                            
                                Convert pandas column of lists into matrix representation (One Hot Encoding)
                            
                                Processing large files in chunks: inconsistent seek with readline
                            
                                How to make a generic Protobuf Parser DoFn in python beam?
                            
                                How to implement LSD in opencv 4.1.0
                            
                                POST file to AWS Mediastore with Python 3 without SDK, without CLI
                            
                                Date Difference Between Two Device Failures
                            
                                _DeadlockError in Django while starting server
                            
                                Python type hints for generic *args (specifically zip or zipWith)
                            
                                flask-restful - resource class for current request
                            
                                Split string every n characters but without splitting a word [duplicate]
                            
                                OpenCV template matching, multiple templates
                            
                                Why do I get a warning when concatenating lists of mixed types in Pycharm?
                            
                                Gcloudignore file is not respected during deployment to App Engine
                            
                                Changing the log level of an imported module
                            
                                pandas GroupBy and cumulative mean of previous rows in group
                            
                                Parsing Mbox from an open file-like object in Python?
                            
                                How not to start same task and wait until it is finished with celery beat
                            
                                What is the difference between partitioning and bucketing in Spark?
                            
                                Combining serializer and model functions
                            
                                Filling cell based on existing cells

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Implementation of sklearn.impute.IterativeImputer

Tags:

python

dataframe

missing-data

imputation

scikit-learn

k.ko3n

People also ask

2 Answers

mel1

akhil penta

Recent Activity

Donate For Us