Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any example data sets for Python?

Tags:

python

dataset

For quick testing, debugging, creating portable examples, and benchmarking, R has available to it a large number of data sets (in the Base R datasets package). The command library(help="datasets") at the R prompt describes nearly 100 historical datasets, each of which have associated descriptions and metadata.

Is there anything like this for Python?

like image 569
a different ben Avatar asked May 16 '13 05:05

a different ben


People also ask

Does Python have data sets?

Dataset in Python has a lot of significance and is mostly used for dealing with a huge amount of data. These datasets have a certain resemblance with the packages present as part of Python 3.6 and more. Python datasets consist of dataset object which in turn comprises metadata as part of the dataset.

How do I find datasets in Python?

Retrieving Datasets in scikit-learn and Seaborn Trivially, you may obtain those datasets by downloading them from the web, either through the browser, via command line, using the wget tool, or using network libraries such as requests in Python.

How many datasets are there in Python?

For quick testing, debugging, creating portable examples, and benchmarking, R has available to it a large number of data sets (in the Base R datasets package). The command library(help="datasets") at the R prompt describes nearly 100 historical datasets, each of which have associated descriptions and metadata.


2 Answers

You can use rpy2 package to access all R datasets from Python.

Set up the interface:

>>> from rpy2.robjects import r, pandas2ri >>> def data(name):  ...    return pandas2ri.ri2py(r[name]) 

Then call data() with any dataset's name of the available datasets (just like in R)

>>> df = data('iris') >>> df.describe()        Sepal.Length  Sepal.Width  Petal.Length  Petal.Width count    150.000000   150.000000    150.000000   150.000000 mean       5.843333     3.057333      3.758000     1.199333 std        0.828066     0.435866      1.765298     0.762238 min        4.300000     2.000000      1.000000     0.100000 25%        5.100000     2.800000      1.600000     0.300000 50%        5.800000     3.000000      4.350000     1.300000 75%        6.400000     3.300000      5.100000     1.800000 max        7.900000     4.400000      6.900000     2.500000 

To see a list of the available datasets with a description for each:

>>> print(r.data()) 

Note: rpy2 requires R installation with setting R_HOME variable, and pandas must be installed as well.

UPDATE

I just created PyDataset, which is a simple module to make loading a dataset from Python as easy as R's (and it does not require R installation, only pandas).

To start using it, install the module:

$ pip install pydataset 

Then just load up any dataset you wish (currently around 757 datasets available):

from pydataset import data  titanic = data('titanic') 
like image 88
Aziz Alto Avatar answered Sep 22 '22 18:09

Aziz Alto


There are also datasets available from the Scikit-Learn library.

from sklearn import datasets 

There are multiple datasets within this package. Some of the Toy Datasets are:

load_boston()          Load and return the boston house-prices dataset (regression). load_iris()            Load and return the iris dataset (classification). load_diabetes()        Load and return the diabetes dataset (regression). load_digits([n_class]) Load and return the digits dataset (classification). load_linnerud()        Load and return the linnerud dataset (multivariate regression). 
like image 38
tmthydvnprt Avatar answered Sep 22 '22 18:09

tmthydvnprt