Is there any package in Python that provides a dictionary for vectorized access with NumPy arrays? I am looking for something like this:
>>> vector_dict = VectorizedDict({1: "One",
... 2: "Two",
... 3: "Three"},
... dtype_key=int, dtype_val="U5")
>>> a = np.array([1,2,3]),
>>> b = vector_dict[a]
>>> print(type(b))
np.ndarray
>>> print(b)
["One", "Two", "Three"]
Although this result would also be possible to achieve by iterating over the array elements, the iteration approach would be rather inefficient for large arrays.
EDIT:
For small dictionaries I use the following approach:
for key, val in my_dict.items():
b[a == key] = val
Although the boolean masking is quite efficient when iterating over small dictionaries, it is time consuming for large dictionaries (thousands of key-value-paris).
Pandas data structures implement this functionality for 1D (pd.Series
), 2D (pd.DataFrame
) and 3D (pd.Panel
) data:
import numpy as np
import pandas as pd
s = pd.Series(data=['One', 'Two', 'Three'], index=[1, 2, 3])
a = np.array([1, 2, 3])
b = s[a]
print(b.values)
['One' 'Two' 'Three']
For higher-dimensional structures, you have xarray.
I have written a vectorized python dictionary/set that efficiently stores data and uses numpy arrays. Most combinations of numpy datatypes are supported.
You can find the project and documentation here: https://github.com/atom-moyer/getpy
Here are two approaches -
def lookup_dict_app1(vector_dict, a):
k = np.array(list(vector_dict.keys()))
v = np.array(list(vector_dict.values()))
sidx = k.argsort()
return v[sidx[np.searchsorted(k,a,sorter=sidx)]].tolist()
def lookup_dict_app2(vector_dict, a):
k = np.array(list(vector_dict.keys()))
v = vector_dict.values()
sidx = k.argsort()
indx = sidx[np.searchsorted(k,a,sorter=sidx)]
out = [v[i] for i in indx]
return out
If the keys obtained with vector_dict.keys()
are already sorted, skip the argsort()
and indexing with sidx
steps. Or, we can do a simple check and get the modified versions, like so -
def lookup_dict_app1_mod(vector_dict, a):
k = np.array(list(vector_dict.keys()))
v = np.array(list(vector_dict.values()))
if (k[1:] >= k[:-1]).all():
return v[np.searchsorted(k,a)].tolist()
else:
sidx = k.argsort()
return v[sidx[np.searchsorted(k,a,sorter=sidx)]].tolist()
def lookup_dict_app2_mod(vector_dict, a):
k = np.array(list(vector_dict.keys()))
v = vector_dict.values()
if (k[1:] >= k[:-1]).all():
return [v[i] for i in np.searchsorted(k,a)]
else:
sidx = k.argsort()
indx = sidx[np.searchsorted(k,a,sorter=sidx)]
return [v[i] for i in indx]
Sample run -
In [166]: vector_dict = {1: 'One', 2: 'Two', 3: 'Three', 0:'Zero'}
In [167]: a = np.array([1,2,3,2,3,1])
In [168]: lookup_dict_app1(vector_dict, a)
Out[168]: ['One', 'Two', 'Three', 'Two', 'Three', 'One']
In [169]: lookup_dict_app2(vector_dict, a)
Out[169]: ['One', 'Two', 'Three', 'Two', 'Three', 'One']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With