Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Element-wise broadcasting for comparing two NumPy arrays?

Let's say I have an array like this:

import numpy as np

base_array = np.array([-13, -9, -11, -3, -3, -4,   2,  2,
                         2,  5,   7,  7,  8,  7,  12, 11])

Suppose I want to know: "how many elements in base_array are greater than 4?" This can be done simply by exploiting broadcasting:

np.sum(4 < base_array)

For which the answer is 7. Now, suppose instead of comparing to a single value, I want to do this over an array. In other words, for each value c in the comparison_array, find out how many elements of base_array are greater than c. If I do this the naive way, it obviously fails because it doesn't know how to broadcast it properly:

comparison_array = np.arange(-13, 13)
comparison_result = np.sum(comparison_array < base_array)

Output:

Traceback (most recent call last):
  File "<pyshell#87>", line 1, in <module>
    np.sum(comparison_array < base_array)
ValueError: operands could not be broadcast together with shapes (26,) (16,) 

If I could somehow have each element of comparison_array get broadcast to base_array's shape, that would solve this. But I don't know how to do such an "element-wise broadcasting".

Now, I do know I how to implement this for both cases using list comprehension:

first = sum([4 < i for i in base_array])
second = [sum([c < i for i in base_array])
          for c in comparison_array]
print(first)
print(second)

Output:

7
[15, 15, 14, 14, 13, 13, 13, 13, 13, 12, 10, 10, 10, 10, 10, 7, 7, 7, 6, 6, 3, 2, 2, 2, 1, 0]

But as we all know, this will be orders of magnitude slower than a correctly-vectorized numpy implementation on larger arrays. So, how should I do this in numpy so that it's fast? Ideally this solution should extend to any kind of operation where broadcasting works, not just greater-than or less-than in this example.

like image 966
dain Avatar asked Aug 06 '18 16:08

dain


People also ask

How do I compare two NumPy array elements wise?

To compare two arrays and return the element-wise minimum, use the numpy. fmin() method in Python Numpy. Return value is either True or False. Compare two arrays and returns a new array containing the element-wise maxima.

How do you find the difference between two arrays in NumPy?

Step 1: Import numpy. Step 2: Define two numpy arrays. Step 3: Find the set difference between these arrays using the setdiff1d() function. Step 4: Print the output.

What is broadcasting in relation to NumPy arrays?

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

What are the conditions under which broadcasting can take place between two elements in NumPy?

Broadcasting Rules: The two arrays are compatible in a dimension if they have the same size in the dimension or if one of the arrays has size 1 in that dimension. The arrays can be broadcast together iff they are compatible with all dimensions.


2 Answers

You can simply add a dimension to the comparison array, so that the comparison is "stretched" across all values along the new dimension.

>>> np.sum(comparison_array[:, None] < base_array)
228

This is the fundamental principle with broadcasting, and works for all kinds of operations.

If you need the sum done along an axis, you just specify the axis along which you want to sum after the comparison.

>>> np.sum(comparison_array[:, None] < base_array, axis=1)
array([15, 15, 14, 14, 13, 13, 13, 13, 13, 12, 10, 10, 10, 10, 10,  7,  7,
        7,  6,  6,  3,  2,  2,  2,  1,  0])
like image 127
miradulo Avatar answered Oct 20 '22 01:10

miradulo


You will want to transpose one of the arrays for broadcasting to work correctly. When you broadcast two arrays together, the dimensions are lined up and any unit dimensions are effectively expanded to the non-unit size that they match. So two arrays of size (16, 1) (the original array) and (1, 26) (the comparison array) would broadcast to (16, 26).

Don't forget to sum across the dimension of size 16:

(base_array[:, None] > comparison_array).sum(axis=1)

None in a slice is equivalent to np.newaxis: it's one of many ways to insert a new unit dimension at the specified index. The reason that you don't need to do comparison_array[None, :] is that broadcasting lines up the highest dimensions, and fills in the lowest with ones automatically.

like image 2
Mad Physicist Avatar answered Oct 20 '22 01:10

Mad Physicist