I have a very large numpy array (containing up to a million elements) like the one below:
[0,1,6,5,1,2,7,6,2,3,8,7,3,4,9,8,5,6,11,10,6,7,12,11,7, 8,13,12,8,9,14,13,10,11,16,15,11,12,17,16,12,13,18,17,13, 14,19,18,15,16,21,20,16,17,22,21,17,18,23,22,18,19,24,23]
and a small dictionary map for replacing some of the elements in the above array
{4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0}
I would like to replace some of the elements according to the map above. The numpy array is really large, and only a small subset of the elements (occurring as keys in the dictionary) will be replaced with the corresponding values. What is the fastest way to do this?
char. replace() method. In Python, this function is used to return a copy of the numpy array of string and this method is available in the NumPy package module. In Python this method will check the condition if the argument count is given, then only the first count occurrences is replaced.
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
In NumPy, to replace missing values NaN ( np. nan ) in ndarray with other numbers, use np. nan_to_num() or np. isnan() .
I believe there's even more efficient method, but for now, try
from numpy import copy newArray = copy(theArray) for k, v in d.iteritems(): newArray[theArray==k] = v
Microbenchmark and test for correctness:
#!/usr/bin/env python2.7 from numpy import copy, random, arange random.seed(0) data = random.randint(30, size=10**5) d = {4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0} dk = d.keys() dv = d.values() def f1(a, d): b = copy(a) for k, v in d.iteritems(): b[a==k] = v return b def f2(a, d): for i in xrange(len(a)): a[i] = d.get(a[i], a[i]) return a def f3(a, dk, dv): mp = arange(0, max(a)+1) mp[dk] = dv return mp[a] a = copy(data) res = f2(a, d) assert (f1(data, d) == res).all() assert (f3(data, dk, dv) == res).all()
Result:
$ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f1(data,d)' 100 loops, best of 3: 6.15 msec per loop $ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f3(data,dk,dv)' 100 loops, best of 3: 19.6 msec per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With