Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting an array of integers to a "vector"

I have an array of integers of length 150 and the integers range from 1 to 3. For example,

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

I would like to convert/map/transform

1 to [0,0,1]

2 to [0,1,0]

3 to [1,0,0]

Is there an efficient way to do that?

So the outputs is like

[0,0,1],[0,0,1],[0,0,1]...[1,0,0]
like image 461
misheekoh Avatar asked Jan 26 '17 03:01

misheekoh


People also ask

Can we create vector from array?

To calculate the number of elements (N) in array, we divided the size of array by the size of the type of elements in array. Then we passed the range arr & arr + N in the vector constructor to create a vector from the array.

Can I store an array in a vector?

There are two ways to store a two-dimensional array/vector: As a vector of vectors (or array of arrays) As a one-dimensional array, with a coordinate transform.

How do I convert a vector to an array in C++?

Using the copy() function to convert vector to array in C++ In C++, the copy() function can copy the elements from one object to another based on some provided range. It is defined in the algorithm header file. In the above example, The copy() function will copy the elements from the vector v to the array a .


2 Answers

First, encode your transform as an array (with a dummy first element since you don't map 0):

>>> mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])

Then it's trivial:

>>> arr = np.array([1,1,2,3,3,3])
>>> mapping[arr]
array([[0, 0, 1],
      [0, 0, 1],
      [0, 1, 0],
      [1, 0, 0],
      [1, 0, 0],
      [1, 0, 0]])
like image 113
John Zwinck Avatar answered Sep 30 '22 09:09

John Zwinck


You can actually just compare them and set the appropriate items:

>>> # a bit shorter so it's easier to demonstrate
>>> arr = np.array([1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
>>> arr2 = np.zeros([arr.size, 3], arr.dtype)
>>> arr2[:, 0] = arr == 3
>>> arr2[:, 1] = arr == 2
>>> arr2[:, 2] = arr == 1

>>> arr2
array([[0, 0, 1],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0]])

You said you were interested in efficiency, so I did some timings:

my_dict = {
    1:[0,0,1],
    2:[0,1,0],
    3:[1,0,0]
    }

mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])

def mine(arr):
    arr2 = np.zeros([arr.size, 3], arr.dtype)
    arr2[:, 0] = arr == 3
    arr2[:, 1] = arr == 2
    arr2[:, 2] = arr == 1
    return arr2

def JoaoAreias(arr):
    return [my_dict[i] for i in arr]

def JohnZwinck(arr):
    return mapping[arr]

def Divakar(arr):
    return (arr == np.arange(3,0,-1)[:,None]).T.astype(np.int8)

def Divakar2(arr):
    return np.take(mapping, arr,axis=0)

arr = np.random.randint(1, 4, (150))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr)        # 5. - 10000 loops, best of 3: 48.3 µs per loop
%timeit JoaoAreias(arr)  # 6. - 10000 loops, best of 3: 179 µs per loop
%timeit JohnZwinck(arr)  # 3. - 10000 loops, best of 3: 24.1 µs per loop
%timeit mine_numba(arr)  # 1. - 100000 loops, best of 3: 6.02 µs per loop
%timeit Divakar(arr)     # 4. - 10000 loops, best of 3: 34.2 µs per loop
%timeit Divakar2(arr)    # 2. - 100000 loops, best of 3: 13.5 µs per loop

arr = np.random.randint(1, 4, (10000))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr)        # 4. - 1000 loops, best of 3: 201 µs per loop
%timeit JoaoAreias(arr)  # 6. - 100 loops, best of 3: 10.2 ms per loop
%timeit JohnZwinck(arr)  # 5. - 1000 loops, best of 3: 455 µs per loop
%timeit mine_numba(arr)  # 1. - 10000 loops, best of 3: 103 µs per loop
%timeit Divakar(arr)     # 3. - 10000 loops, best of 3: 155 µs per loop
%timeit Divakar2(arr)    # 2. - 10000 loops, best of 3: 146 µs per loop

So it depends on your datasize which to prefer, if it's rather small than @JohnZwinck has the fastest solution, for "bigger" datasets my approach wins. :)


Actually if you're going to use something like numba (or alternativly cython or similar) you can beat all other approaches:

import numba as nb

@nb.njit
def mine_numba(arr):
    arr2 = np.zeros((arr.size, 3), arr.dtype)
    for idx in range(arr.size):
        item = arr[idx]
        if item == 1:
            arr2[idx, 2] = 1
        elif item == 2:
            arr2[idx, 1] = 1
        else:
            arr2[idx, 0] = 1
    return arr2
like image 28
MSeifert Avatar answered Sep 30 '22 10:09

MSeifert