I have an array of integers of length 150 and the integers range from 1 to 3. For example,
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
I would like to convert/map/transform
1 to [0,0,1]
2 to [0,1,0]
3 to [1,0,0]
Is there an efficient way to do that?
So the outputs is like
[0,0,1],[0,0,1],[0,0,1]...[1,0,0]
To calculate the number of elements (N) in array, we divided the size of array by the size of the type of elements in array. Then we passed the range arr & arr + N in the vector constructor to create a vector from the array.
There are two ways to store a two-dimensional array/vector: As a vector of vectors (or array of arrays) As a one-dimensional array, with a coordinate transform.
Using the copy() function to convert vector to array in C++ In C++, the copy() function can copy the elements from one object to another based on some provided range. It is defined in the algorithm header file. In the above example, The copy() function will copy the elements from the vector v to the array a .
First, encode your transform as an array (with a dummy first element since you don't map 0):
>>> mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])
Then it's trivial:
>>> arr = np.array([1,1,2,3,3,3])
>>> mapping[arr]
array([[0, 0, 1],
[0, 0, 1],
[0, 1, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]])
You can actually just compare them and set the appropriate items:
>>> # a bit shorter so it's easier to demonstrate
>>> arr = np.array([1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
>>> arr2 = np.zeros([arr.size, 3], arr.dtype)
>>> arr2[:, 0] = arr == 3
>>> arr2[:, 1] = arr == 2
>>> arr2[:, 2] = arr == 1
>>> arr2
array([[0, 0, 1],
[0, 0, 1],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]])
You said you were interested in efficiency, so I did some timings:
my_dict = {
1:[0,0,1],
2:[0,1,0],
3:[1,0,0]
}
mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])
def mine(arr):
arr2 = np.zeros([arr.size, 3], arr.dtype)
arr2[:, 0] = arr == 3
arr2[:, 1] = arr == 2
arr2[:, 2] = arr == 1
return arr2
def JoaoAreias(arr):
return [my_dict[i] for i in arr]
def JohnZwinck(arr):
return mapping[arr]
def Divakar(arr):
return (arr == np.arange(3,0,-1)[:,None]).T.astype(np.int8)
def Divakar2(arr):
return np.take(mapping, arr,axis=0)
arr = np.random.randint(1, 4, (150))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr) # 5. - 10000 loops, best of 3: 48.3 µs per loop
%timeit JoaoAreias(arr) # 6. - 10000 loops, best of 3: 179 µs per loop
%timeit JohnZwinck(arr) # 3. - 10000 loops, best of 3: 24.1 µs per loop
%timeit mine_numba(arr) # 1. - 100000 loops, best of 3: 6.02 µs per loop
%timeit Divakar(arr) # 4. - 10000 loops, best of 3: 34.2 µs per loop
%timeit Divakar2(arr) # 2. - 100000 loops, best of 3: 13.5 µs per loop
arr = np.random.randint(1, 4, (10000))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr) # 4. - 1000 loops, best of 3: 201 µs per loop
%timeit JoaoAreias(arr) # 6. - 100 loops, best of 3: 10.2 ms per loop
%timeit JohnZwinck(arr) # 5. - 1000 loops, best of 3: 455 µs per loop
%timeit mine_numba(arr) # 1. - 10000 loops, best of 3: 103 µs per loop
%timeit Divakar(arr) # 3. - 10000 loops, best of 3: 155 µs per loop
%timeit Divakar2(arr) # 2. - 10000 loops, best of 3: 146 µs per loop
So it depends on your datasize which to prefer, if it's rather small than @JohnZwinck has the fastest solution, for "bigger" datasets my approach wins. :)
Actually if you're going to use something like numba (or alternativly cython
or similar) you can beat all other approaches:
import numba as nb
@nb.njit
def mine_numba(arr):
arr2 = np.zeros((arr.size, 3), arr.dtype)
for idx in range(arr.size):
item = arr[idx]
if item == 1:
arr2[idx, 2] = 1
elif item == 2:
arr2[idx, 1] = 1
else:
arr2[idx, 0] = 1
return arr2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With