Loading a dataset from file, to use with sklearn/numpy, including labels

Tags:

I saw that with sklearn we can use some predefined datasets, for example mydataset = datasets.load_digits() the we can get an array (a numpy array?) of the dataset mydataset.data and an array of the corresponding labels mydataset.target. However I want to load my own dataset to be able to use it with sklearn. How and in which format should I load my data ? My file have the following format (each line is a data-point):

-0.2080,0.3480,0.3280,0.5040,0.9320,1.0000,label1
-0.2864,0.1992,0.2822,0.4398,0.7012,0.7800,label3
...
...
-0.2348,0.3826,0.6142,0.7492,0.0546,-0.4020,label2
-0.1856,0.3592,0.7126,0.7366,0.3414,0.1018,label1

840

asked Feb 27 '13 10:02

shn

1 Answers

You can use numpy's genfromtxt function to retrieve data from the file(http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html)

import numpy as np
mydata = np.genfromtxt(filename, delimiter=",")

However, if you have textual columns, using genfromtxt is trickier, since you need to specify the data types.

It will be much easier with the excellent Pandas library (http://pandas.pydata.org/)

import pandas as pd
mydata = pd.read_csv(filename)
target = mydata["Label"]  #provided your csv has header row, and the label column is named "Label"

#select all but the last column as data
data = mydata.ix[:,:-1]

120

answered Nov 07 '22 02:11

Ando Saabas

Related questions
                            
                                What is the default nltk part of speech tagset?
                            
                                Python object cache
                            
                                Rotated document with ReportLab (vertical text)
                            
                                failure to import pymongo ubuntu
                            
                                Flushing all current figures in matplotlib
                            
                                Python 3 static members
                            
                                Concatenate all rows of a numpy matrix in python
                            
                                Is Python set more space efficient than list?
                            
                                Replace CentralWidget in MainWindow
                            
                                Django model multiple updates with objects' own data?
                            
                                Is sequence unpacking atomic?
                            
                                SciPy NumPy and SciKit-learn , create a sparse matrix
                            
                                Is it possible to use too many functions in Python?
                            
                                Why is Django 1.0.x not able to install from PyPI?
                            
                                python dictionary conundrum
                            
                                Search strings using regular expression in Python
                            
                                Python Headless Browser for GAE
                            
                                numpy: split 1D array of chunks separated by nans into a list of the chunks
                            
                                I am trying to loop between two times, from 8:00 to 17:00 for every 15 mins
                            
                                pygame.time.set_timer confusion?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Loading a dataset from file, to use with sklearn/numpy, including labels

Tags:

python

numpy

dataset

scikit-learn

shn

People also ask

1 Answers

Ando Saabas

Recent Activity

Donate For Us