Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyBrain: Loading data with numpy.loadtxt?

I have some working code which correctly loads data from a csv file into a PyBrain Dataset:

def old_get_dataset():

    reader = csv.reader(open('test.csv', 'rb'))

    header = reader.next()
    fields = dict(zip(header, range(len(header))))
    print header

    # assume last field in csv is single target variable
    # and all other fields are input variables
    dataset = SupervisedDataSet(len(fields) - 1, 1)

    for row in reader:
        #print row[:-1]
        #print row[-1]
        dataset.addSample(row[:-1], row[-1])

    return dataset

Now I'm trying to rewrite this code to use numpy's loadtxt function instead. I believe addSample can take numpy arrays rather than having to add the data one row at a time.

Assuming my loaded numpy array is m x n dimensional, how to I pass in the first m x (n-1) set of data as the first parameter, and the last column of data as the second parameter? This is what I'm trying:

def get_dataset():

    array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1)

    # assume last field in csv is single target variable
    # and all other fields are input variables
    number_of_columns = array.shape[1]
    dataset = SupervisedDataSet(number_of_columns - 1, 1)

    #print array[0]
    #print array[:,:-1]
    #print array[:,-1]
    dataset.addSample(array[:,:-1], array[:,-1])

    return dataset

But I'm getting the following error:

Traceback (most recent call last):
  File "C:\test.py", line 109, in <module>
    (d, n, t) = main()
  File "C:\test.py", line 87, in main
    ds = get_dataset()
  File "C:\test.py", line 45, in get_dataset
    dataset.addSample(array[:,:-1], array[:,-1])
  File "C:\Python27\lib\site-packages\pybrain\datasets\supervised.py",
       line 45, in addSample self.appendLinked(inp, target)
  File "C:\Python27\lib\site-packages\pybrain\datasets\dataset.py",
       line 215, in appendLinked self._appendUnlinked(l, args[i])
  File "C:\Python27\lib\site-packages\pybrain\datasets\dataset.py",
       line 197, in _appendUnlinked self.data[label][self.endmarker[label], :] = row
ValueError: output operand requires a reduction, but reduction is not enabled

How can I fix this?

like image 736
User Avatar asked Apr 12 '12 23:04

User


People also ask

What does Loadtxt () do in NumPy?

loadtxt() is used to return the n-dimensional NumPy array by reading the data from the text file, with an aim to be a fast reader for simple text files. This function numpy. loadtxt() can be used with both absolute and relative paths. It loads the data from the text file into the NumPy array.

How do I load a Python .NPY file?

load() function return the input array from a disk file with npy extension(. npy). Parameters: file : : file-like object, string, or pathlib.

What is the default number to skip rows in NumPy Loadtxt?

Converters can also be used to provide a default value for missing data, e.g. converters = lambda s: float(s. strip() or 0) will convert empty fields to 0. Default: None. Skip the first skiprows lines, including comments; default: 0.


1 Answers

After a lot of experimenting and re-reading the dataset documentation, the following runs without error:

def get_dataset():

    array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1)

    # assume last field in csv is single target variable
    # and all other fields are input variables
    number_of_columns = array.shape[1]
    dataset = SupervisedDataSet(number_of_columns - 1, 1)

    print array[0]
    #print array[:,:-1]
    #print array[:,-1]
    #dataset.addSample(array[:,:-1], array[:,-1])
    #dataset.addSample(array[:,:-1], array[:,-2:-1])
    dataset.setField('input', array[:,:-1])
    dataset.setField('target', array[:,-1:])

    return dataset

I have to double check that it's doing the right thing.

like image 83
User Avatar answered Sep 29 '22 12:09

User