Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast replacement of values in a numpy array

I have a very large numpy array (containing up to a million elements) like the one below:

[0,1,6,5,1,2,7,6,2,3,8,7,3,4,9,8,5,6,11,10,6,7,12,11,7, 8,13,12,8,9,14,13,10,11,16,15,11,12,17,16,12,13,18,17,13, 14,19,18,15,16,21,20,16,17,22,21,17,18,23,22,18,19,24,23] 

and a small dictionary map for replacing some of the elements in the above array

{4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0} 

I would like to replace some of the elements according to the map above. The numpy array is really large, and only a small subset of the elements (occurring as keys in the dictionary) will be replaced with the corresponding values. What is the fastest way to do this?

like image 347
D R Avatar asked Aug 04 '10 08:08

D R


People also ask

How do you replace an array value in Python?

char. replace() method. In Python, this function is used to return a copy of the numpy array of string and this method is available in the NumPy package module. In Python this method will check the condition if the argument count is given, then only the first count occurrences is replaced.

How can I speed up my NumPy operation?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

How do I replace missing values in NumPy?

In NumPy, to replace missing values NaN ( np. nan ) in ndarray with other numbers, use np. nan_to_num() or np. isnan() .


1 Answers

I believe there's even more efficient method, but for now, try

from numpy import copy  newArray = copy(theArray) for k, v in d.iteritems(): newArray[theArray==k] = v 

Microbenchmark and test for correctness:

#!/usr/bin/env python2.7  from numpy import copy, random, arange  random.seed(0) data = random.randint(30, size=10**5)  d = {4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0} dk = d.keys() dv = d.values()  def f1(a, d):     b = copy(a)     for k, v in d.iteritems():         b[a==k] = v     return b  def f2(a, d):     for i in xrange(len(a)):         a[i] = d.get(a[i], a[i])     return a  def f3(a, dk, dv):     mp = arange(0, max(a)+1)     mp[dk] = dv     return mp[a]   a = copy(data) res = f2(a, d)  assert (f1(data, d) == res).all() assert (f3(data, dk, dv) == res).all() 

Result:

$ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f1(data,d)' 100 loops, best of 3: 6.15 msec per loop  $ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f3(data,dk,dv)' 100 loops, best of 3: 19.6 msec per loop 
like image 63
kennytm Avatar answered Sep 16 '22 23:09

kennytm