I am seeing behaviour with numpy bincount that I cannot make sense of. I want to bin the values in a 2D array in a row-wise manner and see the behaviour below. Why would it work with dbArray but fail with simarray? <pre class="prettyprint"><code>>>> dbArray array([[1, 0, 1, 0, 1], [1, 1, 1, 1, 1], [1, 1, 0, 1, 1], [1, 0, 0, 0, 0], [0, 0, 0, 1, 1], [0, 1, 0, 1, 0]]) >>> N.apply_along_axis(N.bincount,1,dbArray) array([[2, 3], [0, 5], [1, 4], [4, 1], [3, 2], [3, 2]], dtype=int64) >>> simarray array([[2, 0, 2, 0, 2], [2, 1, 2, 1, 2], [2, 1, 1, 1, 2], [2, 0, 1, 0, 1], [1, 0, 1, 1, 2], [1, 1, 1, 1, 1]]) >>> N.apply_along_axis(N.bincount,1,simarray) Traceback (most recent call last): File "<pyshell#31>", line 1, in <module> N.apply_along_axis(N.bincount,1,simarray) File "C:\Python27\lib\site-packages\numpy\lib\shape_base.py", line 118, in apply_along_axis outarr[tuple(i.tolist())] = res ValueError: could not broadcast input array from shape (2) into shape (3) </code></pre>

The problem is that <code>bincount</code> isn't always returning the same shaped objects, in particular when values are missing. For example: <pre class="prettyprint"><code>>>> m = np.array([[0,0,1],[1,1,0],[1,1,1]]) >>> np.apply_along_axis(np.bincount, 1, m) array([[2, 1], [1, 2], [0, 3]]) >>> [np.bincount(m[i]) for i in range(m.shape[1])] [array([2, 1]), array([1, 2]), array([0, 3])] </code></pre> works, but: <pre class="prettyprint"><code>>>> m = np.array([[0,0,0],[1,1,0],[1,1,0]]) >>> m array([[0, 0, 0], [1, 1, 0], [1, 1, 0]]) >>> [np.bincount(m[i]) for i in range(m.shape[1])] [array([3]), array([1, 2]), array([1, 2])] >>> np.apply_along_axis(np.bincount, 1, m) Traceback (most recent call last): File "<ipython-input-49-72e06e26a718>", line 1, in <module> np.apply_along_axis(np.bincount, 1, m) File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis outarr[tuple(i.tolist())] = res ValueError: could not broadcast input array from shape (2) into shape (1) </code></pre> won't. You could use the <code>minlength</code> parameter and pass it using a <code>lambda</code> or <code>partial</code> or something: <pre class="prettyprint"><code>>>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m) array([[3, 0], [1, 2], [1, 2]]) </code></pre>

As @DSM has already mentioned, bincount of a 2d array cannot be done without knowing the maximum value of the array, because it would mean an inconsistency of array sizes. But thanks to the power of numpy's indexing, it was fairly easy to make a faster implementation of 2d bincount, as it doesn't use concatenation or anything. <pre class="prettyprint"><code>def bincount2d(arr, bins=None): if bins is None: bins = np.max(arr) + 1 count = np.zeros(shape=[len(arr), bins], dtype=np.int64) indexing = np.arange(len(arr)) for col in arr.T: count[indexing, col] += 1 return count t = np.array([[1,2,3],[4,5,6],[3,2,2]], dtype=np.int64) print(bincount2d(t)) </code></pre> P.S. This: <pre class="prettyprint"><code>t = np.empty(shape=[10000, 100], dtype=np.int64) s = time.time() bincount2d(t) e = time.time() print(e - s) </code></pre> gives ~2 times faster result, than this: <pre class="prettyprint"><code>t = np.empty(shape=[100, 10000], dtype=np.int64) s = time.time() bincount2d(t) e = time.time() print(e - s) </code></pre> because of the for loop iterating over columns. So, it's better to transpose your 2d array, if <code>shape[0] < shape[1]</code>. UPD Better than this can't be done (using python alone, I mean): <pre class="prettyprint"><code>def bincount2d(arr, bins=None): if bins is None: bins = np.max(arr) + 1 count = np.zeros(shape=[len(arr), bins], dtype=np.int64) indexing = (np.ones_like(arr).T * np.arange(len(arr))).T np.add.at(count, (indexing, arr), 1) return count </code></pre>

Can numpy bincount work with 2D arrays?

Tags:

python

arrays

numpy

I am seeing behaviour with numpy bincount that I cannot make sense of. I want to bin the values in a 2D array in a row-wise manner and see the behaviour below. Why would it work with dbArray but fail with simarray?

>>> dbArray
array([[1, 0, 1, 0, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 0, 1, 1],
       [1, 0, 0, 0, 0],
       [0, 0, 0, 1, 1],
       [0, 1, 0, 1, 0]])
>>> N.apply_along_axis(N.bincount,1,dbArray)
array([[2, 3],
       [0, 5],
       [1, 4],
       [4, 1],
       [3, 2],
       [3, 2]], dtype=int64)
>>> simarray
array([[2, 0, 2, 0, 2],
       [2, 1, 2, 1, 2],
       [2, 1, 1, 1, 2],
       [2, 0, 1, 0, 1],
       [1, 0, 1, 1, 2],
       [1, 1, 1, 1, 1]])
>>> N.apply_along_axis(N.bincount,1,simarray)

Traceback (most recent call last):
  File "<pyshell#31>", line 1, in <module>
    N.apply_along_axis(N.bincount,1,simarray)
  File "C:\Python27\lib\site-packages\numpy\lib\shape_base.py", line 118, in apply_along_axis
    outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (3)

223

asked Oct 05 '13 19:10

James

3 Answers

The problem is that bincount isn't always returning the same shaped objects, in particular when values are missing. For example:

>>> m = np.array([[0,0,1],[1,1,0],[1,1,1]])
>>> np.apply_along_axis(np.bincount, 1, m)
array([[2, 1],
       [1, 2],
       [0, 3]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([2, 1]), array([1, 2]), array([0, 3])]

works, but:

>>> m = np.array([[0,0,0],[1,1,0],[1,1,0]])
>>> m
array([[0, 0, 0],
       [1, 1, 0],
       [1, 1, 0]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([3]), array([1, 2]), array([1, 2])]
>>> np.apply_along_axis(np.bincount, 1, m)
Traceback (most recent call last):
  File "<ipython-input-49-72e06e26a718>", line 1, in <module>
    np.apply_along_axis(np.bincount, 1, m)
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis
    outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (1)

won't.

You could use the minlength parameter and pass it using a lambda or partial or something:

>>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m)
array([[3, 0],
       [1, 2],
       [1, 2]])

answered Oct 11 '22 06:10

DSM

As @DSM has already mentioned, bincount of a 2d array cannot be done without knowing the maximum value of the array, because it would mean an inconsistency of array sizes.

But thanks to the power of numpy's indexing, it was fairly easy to make a faster implementation of 2d bincount, as it doesn't use concatenation or anything.

def bincount2d(arr, bins=None):
    if bins is None:
        bins = np.max(arr) + 1
    count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
    indexing = np.arange(len(arr))
    for col in arr.T:
        count[indexing, col] += 1
    return count


t = np.array([[1,2,3],[4,5,6],[3,2,2]], dtype=np.int64)
print(bincount2d(t))

P.S.

This:

t = np.empty(shape=[10000, 100], dtype=np.int64)
s = time.time()
bincount2d(t)
e = time.time()
print(e - s)

gives ~2 times faster result, than this:

t = np.empty(shape=[100, 10000], dtype=np.int64)
s = time.time()
bincount2d(t)
e = time.time()
print(e - s)

because of the for loop iterating over columns. So, it's better to transpose your 2d array, if shape[0] < shape[1].

UPD

Better than this can't be done (using python alone, I mean):

def bincount2d(arr, bins=None):
    if bins is None:
        bins = np.max(arr) + 1
    count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
    indexing = (np.ones_like(arr).T * np.arange(len(arr))).T
    np.add.at(count, (indexing, arr), 1)

    return count

answered Oct 11 '22 06:10

winwin

This is a function that does exactly what you want, but without any loops.

def sub_sum_partition(a, partition):
    """
    Generalization of np.bincount(partition, a).
    Sums rows of a matrix for each value of array of non-negative ints.

    :param a: array_like
    :param partition: array_like, 1 dimension, nonnegative ints
    :return: matrix of shape ('one larger than the largest value in partition', a.shape[1:]). The i's element is
    the sum of rows j in 'a' s.t. partition[j] == i
    """
    assert partition.shape == (len(a),)
    n = np.prod(a.shape[1:], dtype=int)
    bins = ((np.tile(partition, (n, 1)) * n).T + np.arange(n, dtype=int)).reshape(-1)
    sums = np.bincount(bins, a.reshape(-1))
    if n > 1:
        sums = sums.reshape(-1, *a.shape[1:])
    return sums

answered Oct 11 '22 08:10

Evyatar Cohen

Related questions
                            
                                Apache PHP/OSX Mavericks: - failed to open stream: Too many open files
                            
                                Accessing nested JSON with AngularJS
                            
                                eclipse c/c++ CDT build just one file
                            
                                What's the correct way to document a jQuery parameter type with JSDoc?
                            
                                Python string.strip stripping too many characters [duplicate]
                            
                                How does the max() function work on list of strings in python?
                            
                                Move-assignment and reference member
                            
                                CMake with Google Protocol Buffers
                            
                                Is assigning a pointer atomic in Go?
                            
                                Difference between std::merge and std::inplace_merge?
                            
                                Spring RedisTemplate : Serialise multiple Model classes into JSON.Need to use Multiple RedisTemplates?
                            
                                Custom Identity using MVC5 and OWIN

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With