Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading a dataset from file, to use with sklearn/numpy, including labels

I saw that with sklearn we can use some predefined datasets, for example mydataset = datasets.load_digits() the we can get an array (a numpy array?) of the dataset mydataset.data and an array of the corresponding labels mydataset.target. However I want to load my own dataset to be able to use it with sklearn. How and in which format should I load my data ? My file have the following format (each line is a data-point):

-0.2080,0.3480,0.3280,0.5040,0.9320,1.0000,label1
-0.2864,0.1992,0.2822,0.4398,0.7012,0.7800,label3
...
...
-0.2348,0.3826,0.6142,0.7492,0.0546,-0.4020,label2
-0.1856,0.3592,0.7126,0.7366,0.3414,0.1018,label1
like image 840
shn Avatar asked Feb 27 '13 10:02

shn


People also ask

How do I download sklearn datasets in Python?

In NLTK there is a nltk. download() function to download the datasets that are comes with the NLP suite. In sklearn, it talks about loading data sets (http://scikit-learn.org/stable/datasets/) and fetching datas from http://mldata.org/ but for the rest of the datasets, the instructions were to download from the source.

How do I import sklearn dataset to pandas?

You can convert the sklearn dataset to pandas dataframe by using the pd. Dataframe(data=iris. data) method.


1 Answers

You can use numpy's genfromtxt function to retrieve data from the file(http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html)

import numpy as np
mydata = np.genfromtxt(filename, delimiter=",")

However, if you have textual columns, using genfromtxt is trickier, since you need to specify the data types.

It will be much easier with the excellent Pandas library (http://pandas.pydata.org/)

import pandas as pd
mydata = pd.read_csv(filename)
target = mydata["Label"]  #provided your csv has header row, and the label column is named "Label"

#select all but the last column as data
data = mydata.ix[:,:-1]
like image 120
Ando Saabas Avatar answered Nov 07 '22 02:11

Ando Saabas