I have started using sckikit-learn for my work. So I was going through the tutorial which gives standard procedure to load some datasets:
$ python >>> from sklearn import datasets >>> iris = datasets.load_iris() >>> digits = datasets.load_digits()
However, for my convenience, I tried loading the data in the following way:
In [1]: import sklearn In [2]: iris = sklearn.datasets.load_iris()
However, this throws following error:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-2-db77d2036db5> in <module>() ----> 1 iris = sklearn.datasets.load_iris() AttributeError: 'module' object has no attribute 'datasets'
However, if I use the apparently similar method:
In [3]: from sklearn import datasets In [4]: iris = datasets.load_iris()
It works without problem. In fact the following also works:
In [5]: iris = sklearn.datasets.load_iris()
I am completely confused about this. Am I missing something very trivial? What is the difference between the two approaches?
By default all scikit-learn data is stored in '~/scikit_learn_data' subfolders. If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. If True, returns (data.
The sklearn. datasets package embeds some small toy datasets as introduced in the Getting Started section. This package also features helpers to fetch larger datasets commonly used by the machine learning community to benchmark algorithms on data that comes from the 'real world'.
sklearn
is a package. This answer said it very succinctly:
when you import a package, only variables/functions/classes in the
__init__.py
file of that package are directly visible, not sub-packages or modules.
datasets
is a sub-package of sklearn
. This is why this happens:
In [1]: import sklearn In [2]: sklearn.datasets --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-2-325a2bfc35d0> in <module>() ----> 1 sklearn.datasets AttributeError: module 'sklearn' has no attribute 'datasets'
However, the reason why this works:
In [3]: from sklearn import datasets In [4]: sklearn.datasets Out[4]: <module 'sklearn.datasets' from '/home/ethan/.virtualenvs/test3/lib/python3.5/site-packages/sklearn/datasets/__init__.py'>
is that when you load the sub-package datasets
by doing from sklearn import datasets
it is automatically added to the namespace of the package sklearn
. This is one of the lesser-known "traps" of the Python import system.
Also, note that if you look at the __init__.py
for sklearn
you will see 'datasets'
as a member of __all__
, but this only allows you to do:
In [1]: from sklearn import * In [2]: datasets Out[2]: <module 'sklearn.datasets' from '/home/ethan/.virtualenvs/test3/lib/python3.5/site-packages/sklearn/datasets/__init__.py'>
One last point to note is that if you inspect either sklearn
or datasets
you will see that, although they are packages, their type is module
. This is because all packages are considered modules - however, not all modules are packages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With