Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to construct an np.array with fromiter

Tags:

python

numpy

I'm trying to construct an np.array by sampling from a python generator, that yields one row of the array per invocation of next. Here is some sample code:

import numpy as np
data = np.eye(9)
labels = np.array([0,0,0,1,1,1,2,2,2])

def extract_one_class(X,labels,y):
""" Take an array of data X, a column vector array of labels, and one particular label y.  Return an array of all instances in X that have label y """

    return X[np.nonzero(labels[:] == y)[0],:]

def generate_points(data, labels, size):
""" Generate and return 'size' pairs of points drawn from different classes """

     label_alphabet = np.unique(labels)
     assert(label_alphabet.size > 1)

     for useless in xrange(size):
         shuffle(label_alphabet)
         first_class = extract_one_class(data,labels,label_alphabet[0])
         second_class = extract_one_class(data,labels,label_alphabet[1])
         pair = np.hstack((first_class[randint(0,first_class.shape[0]),:],second_class[randint(0,second_class.shape[0]),:]))
         yield pair

points = np.fromiter(generate_points(data,labels,5),dtype = np.dtype('f8',(2*data.shape[1],1)))

The extract_one_class function returns a subset of data: all data points belonging to one class label. I would like to have points be an np.array with shape = (size,data.shape[1]). Currently the code snippet above returns an error:

ValueError: setting an array element with a sequence.

The documentation of fromiter claims to return a one-dimensional array. Yet others have used fromiter to construct record arrays in numpy before (e.g http://iam.al/post/21116450281/numpy-is-my-homeboy).

Am I off the mark in assuming I can generate an array in this fashion? Or is my numpy just not quite right?

like image 321
LeeZamparo Avatar asked Dec 08 '22 21:12

LeeZamparo


2 Answers

As you've noticed, the documentation of np.fromiter explains that the function creates a 1D array. You won't be able to create a 2D array that way, and @unutbu method of returning a 1D array that you reshape afterwards is a sure go.

However, you can indeed create structured arrays using fromiter, as illustrated by:

>>> import itertools
>>> a = itertools.izip((1,2,3),(10,20,30))
>>> r = np.fromiter(a,dtype=[('',int),('',int)])
array([(1, 10), (2, 20), (3, 30)], 
      dtype=[('f0', '<i8'), ('f1', '<i8')])

but look, r.shape=(3,), that is, r is really nothing but 1D array of records, each record being composed of two integers. Because all the fields have the same dtype, we can take a view of r as a 2D array

>>> r.view((int,2))
array([[ 1, 10],
       [ 2, 20],
       [ 3, 30]])

So, yes, you could try to use np.fromiter with a dtype like [('',int)]*data.shape[1]: you'll get a 1D array of length size, that you can then view this array as ((int, data.shape[1])). You can use floats instead of ints, the important part is that all fields have the same dtype.

If you really want it, you can use some fairly complex dtype. Consider for example

r = np.fromiter(((_,) for _ in a),dtype=[('',(int,2))])

Here, you get a 1D structured array with 1 field, the field consisting of an array of 2 integers. Note the use of (_,) to make sure that each record is passed as a tuple (else np.fromiter chokes). But do you need that complexity?

Note also that as you know the length of the array beforehand (it's size), you should use the counter optional argument of np.fromiter for more efficiency.

like image 100
Pierre GM Avatar answered Dec 11 '22 09:12

Pierre GM


You could modify generate_points to yield single floats instead of np.arrays, use np.fromiter to form a 1D array, and then use .reshape(size, -1) to make it a 2D array.

points = np.fromiter(
    generate_points(data,labels,5)).reshape(size, -1)
like image 26
unutbu Avatar answered Dec 11 '22 09:12

unutbu