I have a quite large 1d numpy array Xold with given values. These values shall be replaced according to the rule specified by a 2d numpy array Y: An example would be
Xold=np.array([0,1,2,3,4])
Y=np.array([[0,0],[1,100],[3,300],[4,400],[2,200]])
Whenever a value in Xold is identical to a value in Y[:,0], the new value in Xnew should be the corresponding value in Y[:,1]. This is accomplished by two nested for loops:
Xnew=np.zeros(len(Xold))
for i in range(len(Xold)):
for j in range(len(Y)):
    if Xold[i]==Y[j,0]:
        Xnew[i]=Y[j,1]
With the given example, this yields Xnew=[0,100,200,300,400].
However, for large data sets this procedure is quite slow. What is a faster and more elegant way to accomplish this task?
SELECTING THE FASTEST METHOD
Answers to this question provided a nice assortment of ways to replace elements in numpy array. Let's check, which one would be the quickest.
TL;DR: Numpy indexing is the winner
 def meth1(): # suggested by @Slam
    for old, new in Y:  
        Xold[Xold == old] = new
 def meth2(): # suggested by myself, convert y_dict = dict(Y) first
     [y_dict[i] if i in y_dict.keys() else i for i in Xold]
 def meth3(): # suggested by @Eelco Hoogendoom, import numpy_index as npi first
     npi.remap(Xold, keys=Y[:, 0], values=Y[:, 1])
 def meth4(): # suggested by @Brad Solomon, import pandas as pd first 
     pd.Series(Xold).map(pd.Series(Y[:, 1], index=Y[:, 0])).values
  # suggested by @jdehesa. create Xnew = Xold.copy() and index
  # idx = np.searchsorted(Xold, Y[:, 0]) first
  def meth5():             
     Xnew[idx] = Y[:, 1]
Not so surprising results
 In [39]: timeit.timeit(meth1, number=1000000)                                                                      
 Out[39]: 12.08
 In [40]: timeit.timeit(meth2, number=1000000)                                                                      
 Out[40]: 2.87
 In [38]: timeit.timeit(meth3, number=1000000)                                                                      
 Out[38]: 55.39
 In [12]: timeit.timeit(meth4, number=1000000)                                                                                      
 Out[12]: 256.84
 In [50]: timeit.timeit(meth5, number=1000000)                                                                                      
 Out[50]: 1.12
So, the good old list comprehension is the second fastest, and the winning approach is numpy indexing combined with searchsorted(). 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With