Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deprecation warning in scikit-learn svmlight format loader

I'm getting a new deprecation warning in an IPython notebook I wrote that I've not seen before. What I'm seeing is the following:

X,y = load_svmlight_file('./GasSensorArray/batch2.dat')
/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/sklearn/datasets/svmlight_format.py:137: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
return _load_svmlight_file(f, dtype, multilabel, zero_based, query_id)
/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/sklearn/datasets/svmlight_format.py:137: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
return _load_svmlight_file(f, dtype, multilabel, zero_based, query_id)
...

Any thoughts on what might be the issue here? I took another look at my data file and at first glance, I don't see any obvious issue. I'm not sure what I changed in my system setup that would have caused this. I've got v. 0.14.1 of scikit-learn installed.

like image 897
Chris Avatar asked Nov 19 '13 23:11

Chris


People also ask

How do I ignore warnings in Sklearn?

with warnings. catch_warnings(): warnings. simplefilter("ignore") from sklearn import preprocessing /usr/local/lib/python3. 5/site-packages/sklearn/utils/fixes.

What is Load_svmlight_file?

This format is a text-based format, with one sample per line. It does not store zero valued features hence is suitable for sparse dataset. The first element of each line can be used to store a target variable to predict. This format is used as the default format for both svmlight and the libsvm command line programs.

Is Sklearn an API?

It is one of the main APIs implemented by Scikit-learn. It provides a consistent interface for a wide range of ML applications that's why all machine learning algorithms in Scikit-Learn are implemented via Estimator API. The object that learns from the data (fitting the data) is an estimator.

How many courses Scikit-learn?

We are given samples of each of the 10 possible classes (the digits zero through nine) on which we fit an estimator to be able to predict the classes to which unseen samples belong. In scikit-learn, an estimator for classification is a Python object that implements the methods fit(X, y) and predict(T) .

What is the SVMlight/LIBSVM format used for?

Load datasets in the svmlight / libsvm format into sparse CSR matrix This format is a text-based format, with one sample per line. It does not store zero valued features hence is suitable for sparse dataset.

What is dump_SVMlight_file in sklearn?

sklearn.datasets. dump_svmlight_file(X, y, f, *, zero_based=True, comment=None, query_id=None, multilabel=False) [source] ¶ Dump the dataset in svmlight / libsvm file format. This format is a text-based format, with one sample per line. It does not store zero valued features hence is suitable for sparse dataset.

What is the qid parameter in SVMlight?

In case the file contains a pairwise preference constraint (known as “qid” in the svmlight format) these are ignored unless the query_id parameter is set to True.

What type of file should be encoded in scikit-learn?

This should be either a Unicode string, which will be encoded as UTF-8, or an ASCII byte string. If a comment is given, then it will be preceded by one that identifies the file as having been dumped by scikit-learn.


2 Answers

You probably upgraded the numpy version, as this is a numpy 1.8.0 deprecation warning. Explained in this pull request. Continuation in this PR.

Briefly browsing the sklearn issue tracker, I haven't found any related issues. You can probably search better and file a bug report if not found.

like image 149
alko Avatar answered Oct 06 '22 03:10

alko


After you upgrade numpy, it gives you this deprecation warning whenever you try to index an array using non-integer numbers. In sklearn there are many places where the data type is a floating point number even though the indices are all integer values when computed.

So whenever you index an array in numpy, you need to make sure the indices are integer typed. But this is not the case in many places in sklearn. The fix is sometimes trivial (for example use // instead of / when computing indices using divisions), sometimes not, but for now, no worries, it's just a warning.

like image 40
adrin Avatar answered Oct 06 '22 02:10

adrin