I would like to create a numpy array from an iterable, which yields tuples of values, such as a database query.
Like so:
data = db.execute('SELECT col1, col2, col3, col4 FROM data')
A = np.array(list(data))
Is there a way faster way of doing so, without converting the iterable to a list first?
In Python, Multidimensional Array can be implemented by fitting in a list function inside another list function, which is basically a nesting operation for the list function. Here, a list can have a number of values of any data type that are segregated by a delimiter like a comma.
Yes, Any sequence that has an array-like structure can be passed to the np. array function.
If we need to convert a numpy array to tuples, we can use the tuple() function in Python. The tuple() function takes an iterable as an argument and returns a tuple consisting of the elements of the iterable. We first created an array containing tuples as its elements with the np.
I am not an experienced user of numpy
, but here is a possible solution for the general question:
>>> i = iter([(1, 11), (2, 22)])
>>> i
<listiterator at 0x5b2de30> # a sample iterable of tuples
>>> rec_array = np.fromiter(i, dtype='i4,i4') # mind the dtype
>>> rec_array # rec_array is a record array
array([(1, 11), (2, 22)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
>>> rec_array['f0'], rec_array[0] # each field has a default name
(array([1, 2]), (1, 11))
>>> a = rec_array.view(np.int32).reshape(-1,2) # let's create a view
>>> a
array([[ 1, 11],
[ 2, 22]])
>>> rec_array[0][1] = 23
>>> a # a is a view, not a copy!
array([[ 1, 23],
[ 2, 22]])
I assume that all columns are of the same type, otherwise rec_array is already what you want.
Concerning your particular case, I do not completely understand what is db
in your example. If it is a cursor object, then you can just call its fetchall
method and get a list of tuples. In most cases, the database library does not want to keep a partially read query result, waiting for your code processing each line, that is by the moment when the execute
method returns, all data is already stored in a list, and there is hardly a problem of using fetchall
instead of iterating cursor
instance.
Although technically not an answer to my question, I found a way to do what I am trying to do:
def get_cols(db, cols):
def get_col(col):
data = db.execute('SELECT '+col+' FROM data', dtype=np.float64)
return np.fromiter((v[0] for v in data))
return np.vstack([get_col(col) for col in cols]).T
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With