I am setting the values of multiple elements in a 2D array, however my data sometimes contains multiple values for a given index. It seems that the "later" value is always assigned (see examples below) but is this behaviour guaranteed or is there a chance I will get inconsistent results? How do I know that I can interpret "later" in the way that I would like in a vectorized assignment? i.e. in my first example will <code>a</code> definitely always contain <code>4</code> and in the second example would it ever print <code>values[0]</code>? Very simple example: <pre class="prettyprint"><code>import numpy as np indices = np.zeros(5,dtype=np.int) a[indices] = np.arange(5) a # array([4]) </code></pre> Another example <pre class="prettyprint"><code>import numpy as np grid = np.zeros((1000, 800)) # generate indices and values xs = np.random.randint(0, grid.shape[0], 100) ys = np.random.randint(0, grid.shape[1], 100) values = np.random.rand(100) # make sure we have a duplicate index print values[0], values[5] xs[0] = xs[5] ys[0] = ys[5] grid[xs, ys] = values print "output value is", grid[xs[0], ys[0]] # always prints value of values[5] </code></pre>

I know this has been satisfactorily answered, but I wanted to mention that it is documented as being the "last value" (perhaps informally) in the Tentative Numpy Tutorial under Indexing with Arrays of Indices: <blockquote> However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value: <pre class="prettyprint lang-py prettyprint-override"><code>>>> a = arange(5) >>> a[[0,0,2]]=[1,2,3] >>> a array([2, 1, 3, 3, 4]) </code></pre> This is reasonable enough, but watch out if you want to use Python's += construct, as it may not do what you expect: <pre class="prettyprint lang-py prettyprint-override"><code>>>> a = arange(5) >>> a[[0,0,2]]+=1 >>> a array([1, 1, 3, 3, 4]) </code></pre> Even though 0 occurs twice in the list of indices, the 0th element is only incremented once. This is because Python requires <code>a+=1</code> to be equivalent to <code>a=a+1</code>. </blockquote>

Handling of duplicate indices in NumPy assignments

Tags:

python

numpy

I am setting the values of multiple elements in a 2D array, however my data sometimes contains multiple values for a given index.

It seems that the "later" value is always assigned (see examples below) but is this behaviour guaranteed or is there a chance I will get inconsistent results? How do I know that I can interpret "later" in the way that I would like in a vectorized assignment?

i.e. in my first example will a definitely always contain 4 and in the second example would it ever print values[0]?

Very simple example:

import numpy as np indices = np.zeros(5,dtype=np.int) a[indices] = np.arange(5) a # array([4])

Another example

import numpy as np  grid = np.zeros((1000, 800))  # generate indices and values xs = np.random.randint(0, grid.shape[0], 100) ys = np.random.randint(0, grid.shape[1], 100) values = np.random.rand(100)  # make sure we have a duplicate index print values[0], values[5] xs[0] = xs[5] ys[0] = ys[5]  grid[xs, ys] = values  print "output value is", grid[xs[0], ys[0]] # always prints value of values[5]

517

asked Apr 12 '13 14:04

YXD

2 Answers

In NumPy 1.9 and later this will in general not be well defined.

The current implementation iterates over all (broadcasted) fancy indexes (and the assignment array) at the same time using separate iterators, and these iterators all use C-order. In other words, currently, yes you can. Since you maybe want to know it more exact. If you compare mapping.c in NumPy, which handles these things, you will see that it uses PyArray_ITER_NEXT, which is documented to be in C-order.

For the future I would paint the picture differently. I think it would be good to iterate all indices + the assignment array together using the newer iterator. If this is done, then the order could be kept open for the iterator to decide the fastest way. If you keep it open to the iterator, it is hard to say what would happen, but you cannot be certain that your example works (probably the 1-d case you still can, but...).

So, as far as I can tell it works currently, but it is undocumented (for all I know) so if you actually think that this should be ensured, you would need to lobby for it and best write some tests to make sure it can be guaranteed. Because at least am tempted to say: if it makes things faster, there is no reason to ensure C-order, but of course maybe there is a good reason hidden somewhere...

The real question here is: Why do you want that anyway? ;)

answered Sep 30 '22 16:09

seberg

I know this has been satisfactorily answered, but I wanted to mention that it is documented as being the "last value" (perhaps informally) in the Tentative Numpy Tutorial under Indexing with Arrays of Indices:

However, when the list of indices contains repetitions, the assignment is done several times, leaving behind the last value:
>>> a = arange(5) >>> a[[0,0,2]]=[1,2,3]   >>> a array([2, 1, 3, 3, 4])   
This is reasonable enough, but watch out if you want to use Python's += construct, as it may not do what you expect:
>>> a = arange(5)  >>> a[[0,0,2]]+=1   >>> a array([1, 1, 3, 3, 4])   
Even though 0 occurs twice in the list of indices, the 0th element is only incremented once. This is because Python requires a+=1 to be equivalent to a=a+1.

answered Sep 30 '22 18:09

askewchan

Related questions
                            
                                Change indentation level in Google Colab
                            
                                Python BitTorrent Library [closed]
                            
                                Compiling vim with specific version of Python
                            
                                Concatenate lists in JINJA2
                            
                                Mocking default=timezone.now for unit tests
                            
                                Install Scipy with MKL through PIP
                            
                                Float values as dictionary key
                            
                                What is the difference beautifulsoup and bs4
                            
                                Get variable type in bash
                            
                                Make syscall in Python
                            
                                How to create a numpy array from a pydub AudioSegment?
                            
                                IPython %timeit what is loop and iteration in the options?
                            
                                What is "where" argument for in setuptools.find_packages?
                            
                                Multiple assignments in python [duplicate]
                            
                                How to use windows created by the Dataset.window() method in TensorFlow 2.0?
                            
                                Don't touch my shebang
                            
                                What's the difference of numpy.ndarray.T and numpy.ndarray.transpose() when self.ndim < 2
                            
                                How can I access a matlab/octave module from python?
                            
                                How does python compare functions?
                            
                                Deal with overflow in exp using numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With