Let's say I have an array like this:
import numpy as np
base_array = np.array([-13, -9, -11, -3, -3, -4, 2, 2,
2, 5, 7, 7, 8, 7, 12, 11])
Suppose I want to know: "how many elements in base_array
are greater than 4?" This can be done simply by exploiting broadcasting:
np.sum(4 < base_array)
For which the answer is 7
. Now, suppose instead of comparing to a single value, I want to do this over an array. In other words, for each value c
in the comparison_array
, find out how many elements of base_array
are greater than c
. If I do this the naive way, it obviously fails because it doesn't know how to broadcast it properly:
comparison_array = np.arange(-13, 13)
comparison_result = np.sum(comparison_array < base_array)
Output:
Traceback (most recent call last):
File "<pyshell#87>", line 1, in <module>
np.sum(comparison_array < base_array)
ValueError: operands could not be broadcast together with shapes (26,) (16,)
If I could somehow have each element of comparison_array
get broadcast to base_array
's shape, that would solve this. But I don't know how to do such an "element-wise broadcasting".
Now, I do know I how to implement this for both cases using list comprehension:
first = sum([4 < i for i in base_array])
second = [sum([c < i for i in base_array])
for c in comparison_array]
print(first)
print(second)
Output:
7
[15, 15, 14, 14, 13, 13, 13, 13, 13, 12, 10, 10, 10, 10, 10, 7, 7, 7, 6, 6, 3, 2, 2, 2, 1, 0]
But as we all know, this will be orders of magnitude slower than a correctly-vectorized numpy
implementation on larger arrays. So, how should I do this in numpy
so that it's fast? Ideally this solution should extend to any kind of operation where broadcasting works, not just greater-than or less-than in this example.
To compare two arrays and return the element-wise minimum, use the numpy. fmin() method in Python Numpy. Return value is either True or False. Compare two arrays and returns a new array containing the element-wise maxima.
Step 1: Import numpy. Step 2: Define two numpy arrays. Step 3: Find the set difference between these arrays using the setdiff1d() function. Step 4: Print the output.
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
Broadcasting Rules: The two arrays are compatible in a dimension if they have the same size in the dimension or if one of the arrays has size 1 in that dimension. The arrays can be broadcast together iff they are compatible with all dimensions.
You can simply add a dimension to the comparison array, so that the comparison is "stretched" across all values along the new dimension.
>>> np.sum(comparison_array[:, None] < base_array)
228
This is the fundamental principle with broadcasting, and works for all kinds of operations.
If you need the sum done along an axis, you just specify the axis along which you want to sum after the comparison.
>>> np.sum(comparison_array[:, None] < base_array, axis=1)
array([15, 15, 14, 14, 13, 13, 13, 13, 13, 12, 10, 10, 10, 10, 10, 7, 7,
7, 6, 6, 3, 2, 2, 2, 1, 0])
You will want to transpose one of the arrays for broadcasting to work correctly. When you broadcast two arrays together, the dimensions are lined up and any unit dimensions are effectively expanded to the non-unit size that they match. So two arrays of size (16, 1)
(the original array) and (1, 26)
(the comparison array) would broadcast to (16, 26)
.
Don't forget to sum across the dimension of size 16:
(base_array[:, None] > comparison_array).sum(axis=1)
None
in a slice is equivalent to np.newaxis
: it's one of many ways to insert a new unit dimension at the specified index. The reason that you don't need to do comparison_array[None, :]
is that broadcasting lines up the highest dimensions, and fills in the lowest with ones automatically.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With