I started learning how to use theano with lasagne, and started with the mnist example. Now, I want to try my own example: I have a train.csv file, in which every row starts with 0 or 1 which represents the correct answer, followed by 773 0s and 1s which represent the input. I didn't understand how can I turn this file to the wanted numpy arrays in the load_database() function. this is the part from the original function for the mnist database:
...
with gzip.open(filename, 'rb') as f:
data = pickle_load(f, encoding='latin-1')
# The MNIST dataset we have here consists of six numpy arrays:
# Inputs and targets for the training set, validation set and test set.
X_train, y_train = data[0]
X_val, y_val = data[1]
X_test, y_test = data[2]
...
# We just return all the arrays in order, as expected in main().
# (It doesn't matter how we do this as long as we can read them again.)
return X_train, y_train, X_val, y_val, X_test, y_test
and I need to get the X_train (the input) and the y_train (the beginning of every row) from my csv files.
Thanks!
You can use numpy.genfromtxt()
or numpy.loadtxt()
as follows:
from sklearn.cross_validation import KFold
Xy = numpy.genfromtxt('yourfile.csv', delimiter=",")
# the next section provides the required
# training-validation set splitting but
# you can do it manually too, if you want
skf = KFold(len(Xy))
for train_index, valid_index in skf:
ind_train, ind_valid = train_index, valid_index
break
Xy_train, Xy_valid = Xy[ind_train], Xy[ind_valid]
X_train = Xy_train[:, 1:]
y_train = Xy_train[:, 0]
X_valid = Xy_valid[:, 1:]
y_valid = Xy_valid[:, 0]
...
# you can simply ignore the test sets in your case
return X_train, y_train, X_val, y_val #, X_test, y_test
In the code snippet we ignored passing the test
set.
Now you can import your dataset to the main modul or script or whatever, but be aware to remove all the test part from that too.
Or alternatively you can simply pass the valid sets as test
set:
# you can simply pass the valid sets as `test` set
return X_train, y_train, X_val, y_val, X_val, y_val
In the latter case we don't have to care about the main moduls sections refer to the excepted test
set, but as scores (if have) you will get the the validation scores
twice i.e. as test scores
.
Note: I don't know, which mnist example is that one, but probably, after you prepared your data as above, you have to make further modifications in your trainer module too to suit to your data. For example: input shape of data, output shape i.e. the number of classes e.g. in your case the former is 773
, the latter is 2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With