I have some working code which correctly loads data from a csv file into a PyBrain Dataset:
def old_get_dataset():
reader = csv.reader(open('test.csv', 'rb'))
header = reader.next()
fields = dict(zip(header, range(len(header))))
print header
# assume last field in csv is single target variable
# and all other fields are input variables
dataset = SupervisedDataSet(len(fields) - 1, 1)
for row in reader:
#print row[:-1]
#print row[-1]
dataset.addSample(row[:-1], row[-1])
return dataset
Now I'm trying to rewrite this code to use numpy's loadtxt function instead. I believe addSample can take numpy arrays rather than having to add the data one row at a time.
Assuming my loaded numpy array is m x n dimensional, how to I pass in the first m x (n-1) set of data as the first parameter, and the last column of data as the second parameter? This is what I'm trying:
def get_dataset():
array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1)
# assume last field in csv is single target variable
# and all other fields are input variables
number_of_columns = array.shape[1]
dataset = SupervisedDataSet(number_of_columns - 1, 1)
#print array[0]
#print array[:,:-1]
#print array[:,-1]
dataset.addSample(array[:,:-1], array[:,-1])
return dataset
But I'm getting the following error:
Traceback (most recent call last):
File "C:\test.py", line 109, in <module>
(d, n, t) = main()
File "C:\test.py", line 87, in main
ds = get_dataset()
File "C:\test.py", line 45, in get_dataset
dataset.addSample(array[:,:-1], array[:,-1])
File "C:\Python27\lib\site-packages\pybrain\datasets\supervised.py",
line 45, in addSample self.appendLinked(inp, target)
File "C:\Python27\lib\site-packages\pybrain\datasets\dataset.py",
line 215, in appendLinked self._appendUnlinked(l, args[i])
File "C:\Python27\lib\site-packages\pybrain\datasets\dataset.py",
line 197, in _appendUnlinked self.data[label][self.endmarker[label], :] = row
ValueError: output operand requires a reduction, but reduction is not enabled
How can I fix this?
loadtxt() is used to return the n-dimensional NumPy array by reading the data from the text file, with an aim to be a fast reader for simple text files. This function numpy. loadtxt() can be used with both absolute and relative paths. It loads the data from the text file into the NumPy array.
load() function return the input array from a disk file with npy extension(. npy). Parameters: file : : file-like object, string, or pathlib.
Converters can also be used to provide a default value for missing data, e.g. converters = lambda s: float(s. strip() or 0) will convert empty fields to 0. Default: None. Skip the first skiprows lines, including comments; default: 0.
After a lot of experimenting and re-reading the dataset documentation, the following runs without error:
def get_dataset():
array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1)
# assume last field in csv is single target variable
# and all other fields are input variables
number_of_columns = array.shape[1]
dataset = SupervisedDataSet(number_of_columns - 1, 1)
print array[0]
#print array[:,:-1]
#print array[:,-1]
#dataset.addSample(array[:,:-1], array[:,-1])
#dataset.addSample(array[:,:-1], array[:,-2:-1])
dataset.setField('input', array[:,:-1])
dataset.setField('target', array[:,-1:])
return dataset
I have to double check that it's doing the right thing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With