I'm getting a new deprecation warning in an IPython notebook I wrote that I've not seen before. What I'm seeing is the following:
X,y = load_svmlight_file('./GasSensorArray/batch2.dat')
/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/sklearn/datasets/svmlight_format.py:137: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
return _load_svmlight_file(f, dtype, multilabel, zero_based, query_id)
/Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/sklearn/datasets/svmlight_format.py:137: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
return _load_svmlight_file(f, dtype, multilabel, zero_based, query_id)
...
Any thoughts on what might be the issue here? I took another look at my data file and at first glance, I don't see any obvious issue. I'm not sure what I changed in my system setup that would have caused this. I've got v. 0.14.1 of scikit-learn installed.
with warnings. catch_warnings(): warnings. simplefilter("ignore") from sklearn import preprocessing /usr/local/lib/python3. 5/site-packages/sklearn/utils/fixes.
This format is a text-based format, with one sample per line. It does not store zero valued features hence is suitable for sparse dataset. The first element of each line can be used to store a target variable to predict. This format is used as the default format for both svmlight and the libsvm command line programs.
It is one of the main APIs implemented by Scikit-learn. It provides a consistent interface for a wide range of ML applications that's why all machine learning algorithms in Scikit-Learn are implemented via Estimator API. The object that learns from the data (fitting the data) is an estimator.
We are given samples of each of the 10 possible classes (the digits zero through nine) on which we fit an estimator to be able to predict the classes to which unseen samples belong. In scikit-learn, an estimator for classification is a Python object that implements the methods fit(X, y) and predict(T) .
Load datasets in the svmlight / libsvm format into sparse CSR matrix This format is a text-based format, with one sample per line. It does not store zero valued features hence is suitable for sparse dataset.
sklearn.datasets. dump_svmlight_file(X, y, f, *, zero_based=True, comment=None, query_id=None, multilabel=False) [source] ¶ Dump the dataset in svmlight / libsvm file format. This format is a text-based format, with one sample per line. It does not store zero valued features hence is suitable for sparse dataset.
In case the file contains a pairwise preference constraint (known as “qid” in the svmlight format) these are ignored unless the query_id parameter is set to True.
This should be either a Unicode string, which will be encoded as UTF-8, or an ASCII byte string. If a comment is given, then it will be preceded by one that identifies the file as having been dumped by scikit-learn.
You probably upgraded the numpy version, as this is a numpy 1.8.0 deprecation warning. Explained in this pull request. Continuation in this PR.
Briefly browsing the sklearn issue tracker, I haven't found any related issues. You can probably search better and file a bug report if not found.
After you upgrade numpy, it gives you this deprecation warning whenever you try to index an array using non-integer numbers. In sklearn there are many places where the data type is a floating point number even though the indices are all integer values when computed.
So whenever you index an array in numpy, you need to make sure the indices are integer typed. But this is not the case in many places in sklearn. The fix is sometimes trivial (for example use //
instead of /
when computing indices using divisions), sometimes not, but for now, no worries, it's just a warning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With