Converting an array of integers to a "vector"

Tags:

I have an array of integers of length 150 and the integers range from 1 to 3. For example,

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

I would like to convert/map/transform

1 to [0,0,1]

2 to [0,1,0]

3 to [1,0,0]

Is there an efficient way to do that?

So the outputs is like

[0,0,1],[0,0,1],[0,0,1]...[1,0,0]

461

asked Jan 26 '17 03:01

misheekoh

2 Answers

First, encode your transform as an array (with a dummy first element since you don't map 0):

>>> mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])

Then it's trivial:

>>> arr = np.array([1,1,2,3,3,3])
>>> mapping[arr]
array([[0, 0, 1],
      [0, 0, 1],
      [0, 1, 0],
      [1, 0, 0],
      [1, 0, 0],
      [1, 0, 0]])

113

answered Sep 30 '22 09:09

John Zwinck

You can actually just compare them and set the appropriate items:

>>> # a bit shorter so it's easier to demonstrate
>>> arr = np.array([1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
>>> arr2 = np.zeros([arr.size, 3], arr.dtype)
>>> arr2[:, 0] = arr == 3
>>> arr2[:, 1] = arr == 2
>>> arr2[:, 2] = arr == 1

>>> arr2
array([[0, 0, 1],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0]])

You said you were interested in efficiency, so I did some timings:

my_dict = {
    1:[0,0,1],
    2:[0,1,0],
    3:[1,0,0]
    }

mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])

def mine(arr):
    arr2 = np.zeros([arr.size, 3], arr.dtype)
    arr2[:, 0] = arr == 3
    arr2[:, 1] = arr == 2
    arr2[:, 2] = arr == 1
    return arr2

def JoaoAreias(arr):
    return [my_dict[i] for i in arr]

def JohnZwinck(arr):
    return mapping[arr]

def Divakar(arr):
    return (arr == np.arange(3,0,-1)[:,None]).T.astype(np.int8)

def Divakar2(arr):
    return np.take(mapping, arr,axis=0)

arr = np.random.randint(1, 4, (150))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr)        # 5. - 10000 loops, best of 3: 48.3 µs per loop
%timeit JoaoAreias(arr)  # 6. - 10000 loops, best of 3: 179 µs per loop
%timeit JohnZwinck(arr)  # 3. - 10000 loops, best of 3: 24.1 µs per loop
%timeit mine_numba(arr)  # 1. - 100000 loops, best of 3: 6.02 µs per loop
%timeit Divakar(arr)     # 4. - 10000 loops, best of 3: 34.2 µs per loop
%timeit Divakar2(arr)    # 2. - 100000 loops, best of 3: 13.5 µs per loop

arr = np.random.randint(1, 4, (10000))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr)        # 4. - 1000 loops, best of 3: 201 µs per loop
%timeit JoaoAreias(arr)  # 6. - 100 loops, best of 3: 10.2 ms per loop
%timeit JohnZwinck(arr)  # 5. - 1000 loops, best of 3: 455 µs per loop
%timeit mine_numba(arr)  # 1. - 10000 loops, best of 3: 103 µs per loop
%timeit Divakar(arr)     # 3. - 10000 loops, best of 3: 155 µs per loop
%timeit Divakar2(arr)    # 2. - 10000 loops, best of 3: 146 µs per loop

So it depends on your datasize which to prefer, if it's rather small than @JohnZwinck has the fastest solution, for "bigger" datasets my approach wins. :)

Actually if you're going to use something like numba (or alternativly cython or similar) you can beat all other approaches:

import numba as nb

@nb.njit
def mine_numba(arr):
    arr2 = np.zeros((arr.size, 3), arr.dtype)
    for idx in range(arr.size):
        item = arr[idx]
        if item == 1:
            arr2[idx, 2] = 1
        elif item == 2:
            arr2[idx, 1] = 1
        else:
            arr2[idx, 0] = 1
    return arr2

answered Sep 30 '22 10:09

MSeifert

Related questions
                            
                                Chartit is not a valid tag library:Django
                            
                                TemplateDoesNotExist at /polls/
                            
                                imagemagick wand save pdf pages as images
                            
                                Neat way of making urllib work with python 2 and 3
                            
                                python flexible, inline variable assignment
                            
                                Most Pythonic way to iteratively build up a list? [closed]
                            
                                django Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock
                            
                                max date in python list
                            
                                How to remove single space between text
                            
                                Python: Is there a one line script to title case strings except for strings that start with a digit?
                            
                                How to override default python functions like round()?
                            
                                Python - Apscheduler not stopping a job even after using 'remove_job'
                            
                                How to return value from exec in function?
                            
                                module 're' has no attribute 'findall' [duplicate]
                            
                                Convert pandas datetime month to string representation
                            
                                Django Error The `fields` option must be a list or tuple or "__all__"
                            
                                Primitive Calculator - Dynamic Approach
                            
                                Celery "received unregistered task"
                            
                                drop column based on a string condition
                            
                                module.__init__() takes at most 2 arguments error in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Converting an array of integers to a "vector"

Tags:

python

arrays

numpy

misheekoh

People also ask

2 Answers

John Zwinck

MSeifert

Recent Activity

Donate For Us