Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to randomly change the signs of some of the elements in a numpy array?

Given a numpy array

import numpy as np
from numpy.random import random
N = 5
x = random(N)

How to randomly multiply a subset of (some of the elements in) x by -1 in order to change the sign of some of the elements in the array?

like image 831
develarist Avatar asked Oct 23 '25 11:10

develarist


2 Answers

or this:

x = np.where(random(N) > 0.5, -x, x)

(you can replace random(N) > 0.5 with whichever other random rule that suits you...)

like image 75
Julien Avatar answered Oct 26 '25 00:10

Julien


Let's say you have a boolean mask indicating which elements to flip. Then you can do:

x[mask] *= -1

This method also works with a fancy index:

x[index] *= -1 

You could also use np.negative quite efficiently:

np.negative(x, where=mask, out=x)

This approach is likely the most efficient.

Generating a mask is simple. You can encode a simple condition like

mask = np.random.random(N) >= 0.66

Or you can use np.random.choice to select a random fancy index:

index = np.random.choice(N, size=N // 2, replace=False)

Finally, you can use some real hacks to do this with XOR. The idea is that IEEE 754 encodes the sign bit in the highest bit of a number. You can flip the sign by flipping that one bit using the Integer representation of the floating point values. This only works for floats of course.

You could tailor the size of your integers to the size of the float, or simply use np.uint8 with the bits 0x80. Indices would be scaled by the size of the float.

x.view(np.uint8)[index * x.itemsize] ^= 0x80

This assumes little-endian byte order. For big-endian, use an offset:

x.view(np.uint8)[(index + 1) * x.itemsize - 1] ^= 0x80

Timings

Here are some benchmarks I ran on my moderately powered laptop:

import numpy as np
from timeit import repeat

def where(x, mask):
    x = np.where(mask, -x, x)

def mask_(x, mask):
    x[mask] *= -1

def index(x, mask):
    x[np.flatnonzero(mask)] *= -1 

def negat(x, mask):
    np.negative(x, where=mask, out=x)

def xor__(x, mask):
    x.view(np.uint8)[np.flatnonzero(mask) * x.itemsize] ^= 0x80

for E in range(2, 7):
    N = 10**E
    x = np.random.random(N)

    for P in (0.1, 0.5, 0.9):
        mask = np.random.random(N) < P

        print(f'E = {E}, P = {P}:')

        for func in where, mask_, index, negat, xor__:
            B = 10**(7 - E)
            t = min(repeat(lambda: func(x, mask), number=B)) / B
            print(f'{func.__name__}: {t:.3g}')

The results, separated by P:

P = 0.1
+-----+------+---------------------------------------+
|     |      |                  Func                 |
| Exp | Unit +-------+-------+-------+-------+-------+
|     |      | where | mask_ | index | negat | xor__ |
+-----+------+-------+-------+-------+-------+-------+
|  2  |  μs  |  4.11 |  6.20 |  12.5 | *3.06 |  14.9 |
|  3  |  μs  |  7.47 |  12.4 |  15.9 | *5.55 |  18.6 |
|  4  |  μs  | *32.3 |  94.0 |  41.6 |  38.9 |  49.8 |
|  5  |  ms  | *.258 |  1.06 |  .582 |  .575 |  .602 |
|  6  |  ms  |  15.7 |  10.5 |  6.44 | *5.87 |  6.57 |
+-----+------+-------+-------+-------+-------+-------+
P = 0.5
+-----+------+---------------------------------------+
|     |      |                  Func                 |
| Exp | Unit +-------+-------+-------+-------+-------+
|     |      | where | mask_ | index | negat | xor__ |
+-----+------+-------+-------+-------+-------+-------+
|  2  |  μs  |  4.11 |  6.53 |  13.0 | *3.48 |  15.4 |
|  3  |  μs  | *7.42 |  17.1 |  20.1 |  9.71 |  26.0 |
|  4  |  μs  | *32.0 |  234. |  140. |  130. |  150. |
|  5  |  ms  | *.268 |  2.41 |  1.27 |  1.43 |  1.36 |
|  6  |  ms  |  15.5 |  27.7 |  20.5 | *14.2 |  21.1 |
+-----+------+-------+-------+-------+-------+-------+
P = 0.9
+-----+------+---------------------------------------+
|     |      |                  Func                 |
| Exp | Unit +-------+-------+-------+-------+-------+
|     |      | where | mask_ | index | negat | xor__ |
+-----+------+-------+-------+-------+-------+-------+
|  2  |  μs  |  4.11 |  6.23 |  13.2 | *3.13 |  15.7 |
|  3  |  μs  |  7.81 |  15.0 |  23.6 | *6.40 |  28.4 |
|  4  |  μs  | *31.5 |  116. |  104. |  54.8 |  130. |
|  5  |  ms  | *.263 |  1.24 |  .882 |  .612 |  1.02 |
|  6  |  ms  |  16.6 |  18.4 |  21.0 | *6.24 |  22.9 |
+-----+------+-------+-------+-------+-------+-------+

Conclusion: for small arrays (<10^4 elements), and large arrays (>10^6 elements), np.negative is generally the fastest approach. For the sweet spot around 10^3-10^4 elements, np.where dominates. When comparing times, keep in mind that methods index and xor__ depend on index arrays. If that is an input for you, subtract off the times required to call np.flatnonzero.

In all cases, the proportion of flipped elements determined by P does not affect the outcome much.

For reference, I've also timed the differences between using np.random.choice to create indices versus using a mask. These times are a bit approximate, since the results of the two operations are not exactly identical:

def thresh(n, p):
    return np.flatnonzero(np.random.random(n) < p)

def choice(n, p):
    return np.random.choice(n, size=round(n * p), replace=False)

for E in range(2, 7):
    N = 10**E
    for P in (0.1, 0.5, 0.9):
        print(f'E = {E}, P = {P}:')
        for func in thresh, choice:
            B = 10**(7 - E)
            t = min(repeat(lambda: func(N, P), number=B)) / B
            print(f'{func.__name__}: {t:.3g}')

Timings (aggregated into a table):

+-----+------+-----------------------------------------------------+
|     |      |                          P                          |
|     |      +-----------------+-----------------+-----------------+
| Exp | Unit |       0.1       |       0.5       |        0.9      |
|     |      +--------+--------+--------+--------+--------+--------+
|     |      | thresh | choice | thresh | choice | thresh | choice |
+-----+------+--------+--------+--------+--------+--------+--------+
|  2  |  μs  |  14.8  |  35.2  |  15.3  |  34.9  |  14.7  |  34.9  |
|  3  |  μs  |  34.2  |  75.8  |  40.6  |  75.5  |  34.8  |  76.0  |
|  4  |  μs  |  214.  |  494.  |  267.  |  494.  |  206.  |  494.  |
|  5  |  ms  |  1.96  |  4.60  |  2.48  |  4.60  |  1.89  |  4.60  | 
|  6  |  ms  |  26.1  |  50.1  |  34.5  |  50.3  |  30.7  |  50.2  |
+-----+------+--------+--------+--------+--------+--------+--------+

Thresholding a random array and calling np.flatnonzero is always ~2x faster than using np.random.choice. The former approach allows you to replicate masks exactly, while the latter allows you to set the exact number of flipped elements.

like image 25
Mad Physicist Avatar answered Oct 26 '25 02:10

Mad Physicist