Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a dictionary to translate/replace elements of an array? [duplicate]

I have a numpy array, which has hundreds of elements which are capital letters, in no particular order

import numpy as np
abc_array = np.array(['B', 'D', 'A', 'F', 'H', 'I', 'Z', 'J', ...])

Each element in this numpy.ndarray is a numpy.string_.

I also have a "translation dictionary", with key/value pairs such that the capital letter corresponds to a city

transdict = {'A': 'Adelaide', 'B': 'Bombay', 'C': 'Cologne',...}

There are only 26 pairs in the dictionary transdict, but there are hundreds of letters in the numpy array I must translate.

What is the most efficient way to do this?

I have considered using numpy.core.defchararray.replace(a, old, new, count=None)[source] but this returns a ValueError, as the numpy array is a different size that the dictionary keys/values.

AttributeError: 'numpy.ndarray' object has no attribute 'translate'

like image 924
ShanZhengYang Avatar asked Nov 04 '15 18:11

ShanZhengYang


1 Answers

With brute-force NumPy broadcasting -

idx = np.nonzero(transdict.keys() == abc_array[:,None])[1]
out = np.asarray(transdict.values())[idx]

With np.searchsorted based searching and indexing -

sort_idx = np.argsort(transdict.keys())
idx = np.searchsorted(transdict.keys(),abc_array,sorter = sort_idx)
out = np.asarray(transdict.values())[sort_idx][idx]

Sample run -

In [1]: abc_array = np.array(['B', 'D', 'A', 'B', 'D', 'A', 'C'])
   ...: transdict = {'A': 'Adelaide', 'B': 'Bombay', 'C': 'Cologne', 'D': 'Delhi'}
   ...: 

In [2]: idx = np.nonzero(transdict.keys() == abc_array[:,None])[1]
   ...: out = np.asarray(transdict.values())[idx]
   ...: 

In [3]: out
Out[3]: 
array(['Bombay', 'Delhi', 'Adelaide', 'Bombay', 'Delhi', 'Adelaide',
       'Cologne'], 
      dtype='|S8')

In [4]: sort_idx = np.argsort(transdict.keys())
   ...: idx = np.searchsorted(transdict.keys(),abc_array,sorter = sort_idx)
   ...: out = np.asarray(transdict.values())[sort_idx][idx]
   ...: 

In [5]: out
Out[5]: 
array(['Bombay', 'Delhi', 'Adelaide', 'Bombay', 'Delhi', 'Adelaide',
       'Cologne'], 
      dtype='|S8')
like image 59
Divakar Avatar answered Oct 16 '22 12:10

Divakar