Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn doesn't have attribute 'datasets'

I have started using sckikit-learn for my work. So I was going through the tutorial which gives standard procedure to load some datasets:

$ python >>> from sklearn import datasets >>> iris = datasets.load_iris() >>> digits = datasets.load_digits() 

However, for my convenience, I tried loading the data in the following way:

In [1]: import sklearn  In [2]: iris = sklearn.datasets.load_iris() 

However, this throws following error:

--------------------------------------------------------------------------- AttributeError                            Traceback (most recent call last) <ipython-input-2-db77d2036db5> in <module>() ----> 1 iris = sklearn.datasets.load_iris()  AttributeError: 'module' object has no attribute 'datasets' 

However, if I use the apparently similar method:

In [3]: from sklearn import datasets  In [4]: iris = datasets.load_iris() 

It works without problem. In fact the following also works:

In [5]: iris = sklearn.datasets.load_iris() 

I am completely confused about this. Am I missing something very trivial? What is the difference between the two approaches?

like image 909
Peaceful Avatar asked Jan 04 '17 15:01

Peaceful


People also ask

Where are Sklearn datasets stored?

By default all scikit-learn data is stored in '~/scikit_learn_data' subfolders. If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. If True, returns (data.

What is Sklearn import datasets?

The sklearn. datasets package embeds some small toy datasets as introduced in the Getting Started section. This package also features helpers to fetch larger datasets commonly used by the machine learning community to benchmark algorithms on data that comes from the 'real world'.


1 Answers

sklearn is a package. This answer said it very succinctly:

when you import a package, only variables/functions/classes in the __init__.py file of that package are directly visible, not sub-packages or modules.

datasets is a sub-package of sklearn. This is why this happens:

In [1]: import sklearn  In [2]: sklearn.datasets --------------------------------------------------------------------------- AttributeError                            Traceback (most recent call last) <ipython-input-2-325a2bfc35d0> in <module>() ----> 1 sklearn.datasets  AttributeError: module 'sklearn' has no attribute 'datasets' 

However, the reason why this works:

In [3]: from sklearn import datasets  In [4]: sklearn.datasets Out[4]: <module 'sklearn.datasets' from '/home/ethan/.virtualenvs/test3/lib/python3.5/site-packages/sklearn/datasets/__init__.py'> 

is that when you load the sub-package datasets by doing from sklearn import datasets it is automatically added to the namespace of the package sklearn. This is one of the lesser-known "traps" of the Python import system.

Also, note that if you look at the __init__.py for sklearn you will see 'datasets' as a member of __all__, but this only allows you to do:

In [1]: from sklearn import * In [2]: datasets Out[2]: <module 'sklearn.datasets' from '/home/ethan/.virtualenvs/test3/lib/python3.5/site-packages/sklearn/datasets/__init__.py'> 

One last point to note is that if you inspect either sklearn or datasets you will see that, although they are packages, their type is module. This is because all packages are considered modules - however, not all modules are packages.

like image 85
elethan Avatar answered Sep 18 '22 21:09

elethan