<pre class="prettyprint"><code>a = np.zeros((5,4,3)) v = np.ones((5, 4), dtype=int) data = a[v] shp = data.shape </code></pre> This code gives <code>shp==(5,4,4,3)</code> I don't understand why. How can a larger array be output? makes no sense to me and would love an explanation.

This is known as advanced indexing. Advanced indexing allows you to select arbitrary elements in the input array based on an N-dimensional index. Let's use another example to make it clearer: <pre class="prettyprint"><code>a = np.random.randint(1, 5, (5,4,3)) v = np.ones((5, 4), dtype=int) </code></pre> Say in this case <code>a</code> is: <pre class="prettyprint"><code>array([[[2, 1, 1], [3, 4, 4], [4, 3, 2], [2, 2, 2]], [[4, 4, 1], [3, 3, 4], [3, 4, 2], [1, 3, 1]], [[3, 1, 3], [4, 3, 1], [2, 1, 4], [1, 2, 2]], ... </code></pre> By indexing with an array of <code>np.ones</code>: <pre class="prettyprint"><code>print(v) array([[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]) </code></pre> You will simply be indexing <code>a</code> with <code>1</code> along the first axis as many times as <code>v</code>. Putting it in another way, when you do: <pre class="prettyprint"><code>a[1] [[4, 4, 1], [3, 3, 4], [3, 4, 2], [1, 3, 1]] </code></pre> You're indexing along the first axis, as no indexing is specified along the additional axes. It is the same as doing <code>a[1, ...]</code>, i.e taking a full slice along the remaining axes. Hence by indexing with a <code>2D</code> array of ones, you will have the above <code>2D</code> array <code>(5, 4)</code> times stacked together, resulting in an ndarray of shape <code>(5, 4, 4, 3)</code>. Or in other words, <code>a[1]</code>, of shape <code>(4,3)</code>, stacked <code>5*4=20</code> times. Hence, in this case you'd be getting: <pre class="prettyprint"><code>array([[[[4, 4, 1], [3, 3, 4], [3, 4, 2], [1, 3, 1]], [[4, 4, 1], [3, 3, 4], [3, 4, 2], [1, 3, 1]], ... </code></pre>

Indexing numpy array with index array of lower dim yields array of higher dim than both

Tags:

python

arrays

numpy

a = np.zeros((5,4,3))
v = np.ones((5, 4), dtype=int)
data = a[v]
shp = data.shape

This code gives shp==(5,4,4,3)

I don't understand why. How can a larger array be output? makes no sense to me and would love an explanation.

958

asked Jul 09 '19 07:07

Gulzar

1 Answers

This is known as advanced indexing. Advanced indexing allows you to select arbitrary elements in the input array based on an N-dimensional index.

Let's use another example to make it clearer:

a = np.random.randint(1, 5, (5,4,3))
v = np.ones((5, 4), dtype=int)

Say in this case a is:

array([[[2, 1, 1],
        [3, 4, 4],
        [4, 3, 2],
        [2, 2, 2]],

       [[4, 4, 1],
        [3, 3, 4],
        [3, 4, 2],
        [1, 3, 1]],

       [[3, 1, 3],
        [4, 3, 1],
        [2, 1, 4],
        [1, 2, 2]],
        ...

By indexing with an array of np.ones:

print(v)

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

You will simply be indexing a with 1 along the first axis as many times as v. Putting it in another way, when you do:

a[1]

[[4, 4, 1],
 [3, 3, 4],
 [3, 4, 2],
 [1, 3, 1]]

You're indexing along the first axis, as no indexing is specified along the additional axes. It is the same as doing a[1, ...], i.e taking a full slice along the remaining axes. Hence by indexing with a 2D array of ones, you will have the above 2D array (5, 4) times stacked together, resulting in an ndarray of shape (5, 4, 4, 3). Or in other words, a[1], of shape (4,3), stacked 5*4=20 times.

Hence, in this case you'd be getting:

array([[[[4, 4, 1],
         [3, 3, 4],
         [3, 4, 2],
         [1, 3, 1]],

        [[4, 4, 1],
         [3, 3, 4],
         [3, 4, 2],
         [1, 3, 1]],
         ...

answered Oct 15 '22 22:10

yatu

Related questions
                            
                                Time complexity of this algorithm: Word Ladder
                            
                                Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'
                            
                                VsCode Remote Debugging, change pythonpath to point to docker container's python interpreter
                            
                                pandas.core.indexing.IndexingError: Too many indexers
                            
                                Prediction step for time series using continuous hidden Markov models
                            
                                How to use Generic (higher-level) type variables in type hinting system?
                            
                                Selenium - Difference between text_to_be_present_in_element and text_to_be_present_in_element_value
                            
                                Obtain input_array and output_array items to convert model to tflite format
                            
                                Executing multiple lines of input in pycharm console from first line
                            
                                Jupyter notebook with Python 2 and Python3 Kernel
                            
                                Feature-wise scaling and shifting (FiLM layer) in Keras
                            
                                Django: using F() expressions on JSONField?
                            
                                Controlling stack-order of an altair area
                            
                                what's the difference between airflow's 'parallelism' and 'dag_concurrency'
                            
                                Why does this dict of 7 items only consume 368 bytes?
                            
                                Python ThreadPoolExecutor Suppress Exceptions
                            
                                Create a generic List from C# dll in python script
                            
                                In TensorFlow 2.0 with eager-execution, how to compute the gradients of a network output wrt a specific layer?
                            
                                Multiple output regression or classifier with one (or more) parameters with Python
                            
                                Rolling sum with strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With