I have an unsorted array of numbers.
I need to replace certain numbers (given in a list) with specific alternatives (also given in a corresponding list)
I wrote the following code (which seems to works):
import numpy as np
numbers = np.arange(0,40)
np.random.shuffle(numbers)
problem_numbers = [33, 23, 15] # table, night_stand, plant
alternative_numbers = [12, 14, 26] # desk, dresser, flower_pot
for i in range(len(problem_numbers)):
idx = numbers == problem_numbers[i]
numbers[idx] = alternative_numbers[i]
However, this seems highly inefficient (this needs to be done several millions of times for much larger arrays).
I found this question which answers a similar problem however in my case the numbers are not sorted and they need to maintain their original location.
Note: numbers
may contain multiple or no occurrences of elements in problem_numbers
char. replace() method. In Python, this function is used to return a copy of the numpy array of string and this method is available in the NumPy package module. In Python this method will check the condition if the argument count is given, then only the first count occurrences is replaced.
replace() function, each element in arr, return a copy of the string with all occurrences of substring old replaced by new. Syntax : numpy.core.defchararray.replace(arr, old, new, count = None) Parameters : arr : [array-like of str] Given array-like of string.
EDIT: I implemented a TensorFlow version of this in this answer (almost exactly the same, except replacements are a dict).
Here is a simple way to do it:
import numpy as np
numbers = np.arange(0,40)
np.random.shuffle(numbers)
problem_numbers = [33, 23, 15] # table, night_stand, plant
alternative_numbers = [12, 14, 26] # desk, dresser, flower_pot
# Replace values
problem_numbers = np.asarray(problem_numbers)
alternative_numbers = np.asarray(alternative_numbers)
n_min, n_max = numbers.min(), numbers.max()
replacer = np.arange(n_min, n_max + 1)
# Mask replacements out of range
mask = (problem_numbers >= n_min) & (problem_numbers <= n_max)
replacer[problem_numbers[mask] - n_min] = alternative_numbers[mask]
numbers = replacer[numbers - n_min]
This works well an should be efficient as long as the range of the values in numbers
(the difference between the smallest and the biggest) is not huge (e.g you don't have something like 1
, 7
and 10000000000
).
Benchmarking
I've compared the code in the OP with the three (as of now) proposed solutions with this code:
import numpy as np
def method_itzik(numbers, problem_numbers, alternative_numbers):
numbers = np.asarray(numbers)
for i in range(len(problem_numbers)):
idx = numbers == problem_numbers[i]
numbers[idx] = alternative_numbers[i]
return numbers
def method_mseifert(numbers, problem_numbers, alternative_numbers):
numbers = np.asarray(numbers)
replacer = dict(zip(problem_numbers, alternative_numbers))
numbers_list = numbers.tolist()
numbers = np.array(list(map(replacer.get, numbers_list, numbers_list)))
return numbers
def method_divakar(numbers, problem_numbers, alternative_numbers):
numbers = np.asarray(numbers)
problem_numbers = np.asarray(problem_numbers)
problem_numbers = np.asarray(alternative_numbers)
# Pre-process problem_numbers and correspondingly alternative_numbers
# such that repeats and no matches are taken care of
sidx_pn = problem_numbers.argsort()
pn = problem_numbers[sidx_pn]
mask = np.concatenate(([True],pn[1:] != pn[:-1]))
an = alternative_numbers[sidx_pn]
minN, maxN = numbers.min(), numbers.max()
mask &= (pn >= minN) & (pn <= maxN)
pn = pn[mask]
an = an[mask]
# Pre-pocessing done. Now, we need to use pn and an in place of
# problem_numbers and alternative_numbers repectively. Map, index and assign.
sidx = numbers.argsort()
idx = sidx[np.searchsorted(numbers, pn, sorter=sidx)]
valid_mask = numbers[idx] == pn
numbers[idx[valid_mask]] = an[valid_mask]
def method_jdehesa(numbers, problem_numbers, alternative_numbers):
numbers = np.asarray(numbers)
problem_numbers = np.asarray(problem_numbers)
alternative_numbers = np.asarray(alternative_numbers)
n_min, n_max = numbers.min(), numbers.max()
replacer = np.arange(n_min, n_max + 1)
# Mask replacements out of range
mask = (problem_numbers >= n_min) & (problem_numbers <= n_max)
replacer[problem_numbers[mask] - n_min] = alternative_numbers[mask]
numbers = replacer[numbers - n_min]
return numbers
The results:
import numpy as np
np.random.seed(100)
MAX_NUM = 100000
numbers = np.random.randint(0, MAX_NUM, size=100000)
problem_numbers = np.unique(np.random.randint(0, MAX_NUM, size=500))
alternative_numbers = np.random.randint(0, MAX_NUM, size=len(problem_numbers))
%timeit method_itzik(numbers, problem_numbers, alternative_numbers)
10 loops, best of 3: 63.3 ms per loop
# This method expects lists
problem_numbers_l = list(problem_numbers)
alternative_numbers_l = list(alternative_numbers)
%timeit method_mseifert(numbers, problem_numbers_l, alternative_numbers_l)
10 loops, best of 3: 20.5 ms per loop
%timeit method_divakar(numbers, problem_numbers, alternative_numbers)
100 loops, best of 3: 9.45 ms per loop
%timeit method_jdehesa(numbers, problem_numbers, alternative_numbers)
1000 loops, best of 3: 822 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With