Consider the following example in Python 2.7. We have an arbitrary function f() that returns two 1-dimensional numpy arrays. Note that in general f() may returns arrays of different size and that the size may depend on the input.
Now we would like to call map on f() and concatenate the results into two separate new arrays.
import numpy as np
def f(x):
return np.arange(x),np.ones(x,dtype=int)
inputs = np.arange(1,10)
result = map(f,inputs)
x = np.concatenate([i[0] for i in result])
y = np.concatenate([i[1] for i in result])
This gives the intended result. However, since result may take up much memory, it may be preferable to use a generator by calling imap instead of map.
from itertools import imap
result = imap(f,inputs)
x = np.concatenate([i[0] for i in result])
y = np.concatenate([i[1] for i in result])
However, this gives an error because the generator is empty at the point where we calculate y.
Is there a way to use the generator only once and still create these two concatenated arrays? I'm looking for a solution without a for loop, since it is rather inefficient to repeatedly concatenate/append arrays.
Thanks in advance.
Is there a way to use the generator only once and still create these two concatenated arrays?
Yes, a generator can be cloned with tee:
import itertools
a, b = itertools.tee(result)
x = np.concatenate([i[0] for i in a])
y = np.concatenate([i[1] for i in b])
However, using tee does not help with the memory usage in your case. The above solution would require 5 N memory to run:
tee,np.concatenate calls,Clearly, we could do better by dropping the tee:
x_acc = []
y_acc = []
for x_i, y_i in result:
x_acc.append(x_i)
y_acc.append(y_i)
x = np.concatenate(x_acc)
y = np.concatenate(y_acc)
This shaved off one more N, leaving 4 N. Going further means dropping the intermediate lists and preallocating x and y. Note, that you needn't know the exact sizes of the arrays, only the upper bounds:
x = np.empty(capacity)
y = np.empty(capacity)
right = 0
for x_i, y_i in result:
left = right
right += len(x_i) # == len(y_i)
x[left:right] = x_i
y[left:right] = y_i
x = x[:right].copy()
y = y[:right].copy()
In fact, you don't even need an upper bound. Just ensure that x and y are big enough to accommodate the new item:
for x_i, y_i in result:
# ...
if right >= len(x):
# It would be slightly trickier for >1D, but the idea
# remains the same: alter the 0-the dimension to fit
# the new item.
new_capacity = max(right, len(x)) * 1.5
x = x.resize(new_capacity)
y = y.resize(new_capacity)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With