Is there a way to vectorize an operation that takes several numpy arrays and puts them into a list of dictionaries?
Here's a simplified example. The real scenario might involve more arrays and more dictionary keys.
import numpy as np
x = np.arange(10)
y = np.arange(10, 20)
z = np.arange(100, 110)
print [dict(x=x[ii], y=y[ii], z=z[ii]) for ii in xrange(10)]
I might have thousands or hundreds of thousands of iterations in the xrange
call. All the manipulation to create x
, y
, and z
is vectorized (my example is not as simple as above). So, there's only 1 for loop left to get rid of, which I expect would result in huge speed ups.
I've tried using map
with a function to create the dict and all sorts of other work arounds. It seems the Python for
loop is the slow part (as usual). I'm sort of stuck to using dictionaries because of a pre-existing API requirement. However, solutions without dicts and record arrays or something would be interesting to see, but ultimately I don't think that will work with the existing API.
With your small example, I'm having trouble getting anything faster than the combination of list and dictionary comprehensions
In [105]: timeit [{'x':i, 'y':j, 'z':k} for i,j,k in zip(x,y,z)]
100000 loops, best of 3: 15.5 µs per loop
In [106]: timeit [{'key':{'x':i, 'y':j, 'z':k}} for i,j,k in zip(x,y,z)]
10000 loops, best of 3: 37.3 µs per loop
The alternatives that use array concatenation to join the arrays before partitioning are slower.
In [108]: timeit [{'x':x_, 'y':y_, 'z':z_} for x_, y_, z_ in np.column_stack((x,y,z))]
....
10000 loops, best of 3: 58.2 µs per loop
=======================
A structured array is easiest with recfunctions
:
In [109]: from numpy.lib import recfunctions
In [112]: M=recfunctions.merge_arrays((x,y,z))
In [113]: M.dtype.names=['x','y','z']
In [114]: M
Out[114]:
array([(0, 10, 100), (1, 11, 101), (2, 12, 102), (3, 13, 103),
(4, 14, 104), (5, 15, 105), (6, 16, 106), (7, 17, 107),
(8, 18, 108), (9, 19, 109)],
dtype=[('x', '<i4'), ('y', '<i4'), ('z', '<i4')])
In [115]: M['x']
Out[115]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Time it much slower, but if you want to access all the x
values at once, it's much better than fetching them from all the dictionaries.
np.rec.fromarrays((x,y,z),names=['x','y','z'])
produces a recarray with given names. About the same speed.
I could also construct an empty array of the right dtype and shape and copy the arrays to it. That's probably as fast as this merge
but more complicated to describe.
I'd suggest optimizing the data structure for use/access rather than construction speed. Generally you construct it once, and use it many times.
============
In [125]: dt=np.dtype([('x',x.dtype),('y',y.dtype),('z',z.dtype)])
In [126]: xyz=np.zeros(x.shape,dtype=dt)
In [127]: xyz['x']=x; xyz['y']=y; xyz['z']=z
# or for n,d in zip(xyz.dtype.names, (x,y,z)): xyz[n] = d
In [128]: xyz
Out[128]:
array([(0, 10, 100), (1, 11, 101), (2, 12, 102), (3, 13, 103),
(4, 14, 104), (5, 15, 105), (6, 16, 106), (7, 17, 107),
(8, 18, 108), (9, 19, 109)],
dtype=[('x', '<i4'), ('y', '<i4'), ('z', '<i4')])
Here is one (Num)?Pythonic way:
In [18]: names = np.array(['x', 'y', 'z'])
In [38]: map(dict, np.dstack((np.repeat(names[None, :], 10, axis=0), np.column_stack((x, y, z)))))
Out[38]:
[{'x': '0', 'y': '10', 'z': '100'},
{'x': '1', 'y': '11', 'z': '101'},
{'x': '2', 'y': '12', 'z': '102'},
{'x': '3', 'y': '13', 'z': '103'},
{'x': '4', 'y': '14', 'z': '104'},
{'x': '5', 'y': '15', 'z': '105'},
{'x': '6', 'y': '16', 'z': '106'},
{'x': '7', 'y': '17', 'z': '107'},
{'x': '8', 'y': '18', 'z': '108'},
{'x': '9', 'y': '19', 'z': '109'}]
Also, note that if you don't need all of the dictionaries at once, you can simply create a generator and access to each item on demand.
(dict(x=x[ii], y=y[ii], z=z[ii]) for ii in xrange(10))
If you want a nested dictionary, I suggest a list comprehension:
In [88]: inner = np.dstack((np.repeat(names[None, :], 10, axis=0), np.column_stack((x, y))))
In [89]: [{'connection': d} for d in map(dict, inner)]
Out[89]:
[{'connection': {'x': '0', 'y': '10'}},
{'connection': {'x': '1', 'y': '11'}},
{'connection': {'x': '2', 'y': '12'}},
{'connection': {'x': '3', 'y': '13'}},
{'connection': {'x': '4', 'y': '14'}},
{'connection': {'x': '5', 'y': '15'}},
{'connection': {'x': '6', 'y': '16'}},
{'connection': {'x': '7', 'y': '17'}},
{'connection': {'x': '8', 'y': '18'}},
{'connection': {'x': '9', 'y': '19'}}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With