Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy array from csv file for lasagne

I started learning how to use theano with lasagne, and started with the mnist example. Now, I want to try my own example: I have a train.csv file, in which every row starts with 0 or 1 which represents the correct answer, followed by 773 0s and 1s which represent the input. I didn't understand how can I turn this file to the wanted numpy arrays in the load_database() function. this is the part from the original function for the mnist database:

...

with gzip.open(filename, 'rb') as f:
    data = pickle_load(f, encoding='latin-1')

# The MNIST dataset we have here consists of six numpy arrays:
# Inputs and targets for the training set, validation set and test set.
X_train, y_train = data[0]
X_val, y_val = data[1]
X_test, y_test = data[2]

...

# We just return all the arrays in order, as expected in main().
# (It doesn't matter how we do this as long as we can read them again.)
return X_train, y_train, X_val, y_val, X_test, y_test

and I need to get the X_train (the input) and the y_train (the beginning of every row) from my csv files.

Thanks!

like image 740
user5165960 Avatar asked Jul 28 '15 17:07

user5165960


1 Answers

You can use numpy.genfromtxt() or numpy.loadtxt() as follows:

from sklearn.cross_validation import KFold

Xy = numpy.genfromtxt('yourfile.csv', delimiter=",")

# the next section provides the required
# training-validation set splitting but 
# you can do it manually too, if you want

skf = KFold(len(Xy))

for train_index, valid_index in skf:
    ind_train, ind_valid = train_index, valid_index
    break

Xy_train, Xy_valid = Xy[ind_train], Xy[ind_valid]

X_train = Xy_train[:, 1:]
y_train = Xy_train[:, 0]

X_valid = Xy_valid[:, 1:]
y_valid = Xy_valid[:, 0]


...

# you can simply ignore the test sets in your case
return X_train, y_train, X_val, y_val #, X_test, y_test

In the code snippet we ignored passing the test set.

Now you can import your dataset to the main modul or script or whatever, but be aware to remove all the test part from that too.

Or alternatively you can simply pass the valid sets as test set:

# you can simply pass the valid sets as `test` set
return X_train, y_train, X_val, y_val, X_val, y_val

In the latter case we don't have to care about the main moduls sections refer to the excepted test set, but as scores (if have) you will get the the validation scores twice i.e. as test scores.

Note: I don't know, which mnist example is that one, but probably, after you prepared your data as above, you have to make further modifications in your trainer module too to suit to your data. For example: input shape of data, output shape i.e. the number of classes e.g. in your case the former is 773, the latter is 2.

like image 98
Geeocode Avatar answered Nov 05 '22 23:11

Geeocode