I was doing a little experimentation with 2D lists and numpy arrays. From this, I've raised 3 questions I'm quite curious to know the answer for. First, I initialized a 2D python list. <pre class="prettyprint"><code>>>> my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] </code></pre> I then tried indexing the list with a tuple. <pre class="prettyprint"><code>>>> my_list[:,] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not tuple </code></pre> Since the interpreter throws me a <code>TypeError</code> and not a <code>SyntaxError</code>, I surmised it is actually possible to do this, but python does not natively support it. I then tried converting the list to a <code>numpy</code> array and doing the same thing. <pre class="prettyprint"><code>>>> np.array(my_list)[:,] array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) </code></pre> Of course, no problem. My understanding is that one of the <code>__xx__()</code> methods have been overridden and implemented in the <code>numpy</code> package. Numpy's indexing supports lists too: <pre class="prettyprint"><code>>>> np.array(my_list)[:,[0, 1]] array([[1, 2], [4, 5], [7, 8]]) </code></pre> This has raised a couple of questions: <ol> <li>Which <code>__xx__</code> method has numpy overridden/defined to handle fancy indexing?</li> <li>Why don't python lists natively support fancy indexing?</li> </ol> (Bonus question: why do my timings show that slicing in python2 is slower than python3?)

You have three questions: <h3>1. Which <code>__xx__</code> method has numpy overridden/defined to handle fancy indexing?</h3> The indexing operator <code>[]</code> is overridable using <code>__getitem__</code>, <code>__setitem__</code>, and <code>__delitem__</code>. It can be fun to write a simple subclass that offers some introspection: <pre class="prettyprint"><code>>>> class VerboseList(list): ... def __getitem__(self, key): ... print(key) ... return super().__getitem__(key) ... </code></pre> Let's make an empty one first: <pre class="prettyprint"><code>>>> l = VerboseList() </code></pre> Now fill it with some values. Note that we haven't overridden <code>__setitem__</code> so nothing interesting happens yet: <pre class="prettyprint"><code>>>> l[:] = range(10) </code></pre> Now let's get an item. At index <code>0</code> will be <code>0</code>: <pre class="prettyprint"><code>>>> l[0] 0 0 </code></pre> If we try to use a tuple, we get an error, but we get to see the tuple first! <pre class="prettyprint"><code>>>> l[0, 4] (0, 4) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in __getitem__ TypeError: list indices must be integers or slices, not tuple </code></pre> We can also find out how python represents slices internally: <pre class="prettyprint"><code>>>> l[1:3] slice(1, 3, None) [1, 2] </code></pre> There are lots more fun things you can do with this object -- give it a try! <h3>2. Why don't python lists natively support fancy indexing?</h3> This is hard to answer. One way of thinking about it is historical: because the <code>numpy</code> developers thought of it first. <h3>You youngsters. When I was a kid...</h3> Upon its first public release in 1991, Python had no <code>numpy</code> library, and to make a multi-dimensional list, you had to nest list structures. I assume that the early developers -- in particular, Guido van Rossum (GvR) -- felt that keeping things simple was best, initially. Slice indexing was already pretty powerful. However, not too long after, interest grew in using Python as a scientific computing language. Between 1995 and 1997, a number of developers collaborated on a library called <code>numeric</code>, an early predecessor of <code>numpy</code>. Though he wasn't a major contributor to <code>numeric</code> or <code>numpy</code>, GvR coordinated with the <code>numeric</code> developers, extending Python's slice syntax in ways that made multidimensional array indexing easier. Later, an alternative to <code>numeric</code> arose called <code>numarray</code>; and in 2006, <code>numpy</code> was created, incorporating the best features of both. These libraries were powerful, but they required heavy c extensions and so on. Working them into the base Python distribution would have made it bulky. And although GvR did enhance slice syntax a bit, adding fancy indexing to ordinary lists would have changed their API dramatically -- and somewhat redundantly. Given that fancy indexing could be had with an outside library already, the benefit wasn't worth the cost. Parts of this narrative are speculative, in all honesty.1 I don't know the developers really! But it's the same decision I would have made. In fact... <h3>It really should be that way.</h3> Although fancy indexing is very powerful, I'm glad it's not part of vanilla Python even today, because it means that you don't have to think very hard when working with ordinary lists. For many tasks you don't need it, and the cognitive load it imposes is significant. Keep in mind that I'm talking about the load imposed on readers and maintainers. You may be a whiz-bang genius who can do 5-d tensor products in your head, but other people have to read your code. Keeping fancy indexing in <code>numpy</code> means people don't use it unless they honestly need it, which makes code more readable and maintainable in general. <h3>3. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?</h3> Possibly. It's definitely environment-dependent; I don't see the same difference on my machine. <hr> 1. The parts of the narrative that aren't as speculative are drawn from a brief history told in a special issue of Computing in Science and Engineering (2011 vol. 13).

How is numpy's fancy indexing implemented?

Tags:

python

arrays

indexing

numpy

I was doing a little experimentation with 2D lists and numpy arrays. From this, I've raised 3 questions I'm quite curious to know the answer for.

First, I initialized a 2D python list.

>>> my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

I then tried indexing the list with a tuple.

>>> my_list[:,] Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not tuple

Since the interpreter throws me a TypeError and not a SyntaxError, I surmised it is actually possible to do this, but python does not natively support it.

I then tried converting the list to a numpy array and doing the same thing.

>>> np.array(my_list)[:,] array([[1, 2, 3],        [4, 5, 6],        [7, 8, 9]])

Of course, no problem. My understanding is that one of the __xx__() methods have been overridden and implemented in the numpy package.

Numpy's indexing supports lists too:

>>> np.array(my_list)[:,[0, 1]] array([[1, 2],        [4, 5],        [7, 8]])

This has raised a couple of questions:

Which __xx__ method has numpy overridden/defined to handle fancy indexing?
Why don't python lists natively support fancy indexing?

(Bonus question: why do my timings show that slicing in python2 is slower than python3?)

682

asked Jun 15 '17 18:06

cs95

2 Answers

You have three questions:

1. Which `xx` method has numpy overridden/defined to handle fancy indexing?

The indexing operator [] is overridable using __getitem__, __setitem__, and __delitem__. It can be fun to write a simple subclass that offers some introspection:

>>> class VerboseList(list): ...     def __getitem__(self, key): ...         print(key) ...         return super().__getitem__(key) ...

Let's make an empty one first:

>>> l = VerboseList()

Now fill it with some values. Note that we haven't overridden __setitem__ so nothing interesting happens yet:

>>> l[:] = range(10)

Now let's get an item. At index 0 will be 0:

>>> l[0] 0 0

If we try to use a tuple, we get an error, but we get to see the tuple first!

>>> l[0, 4] (0, 4) Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "<stdin>", line 4, in __getitem__ TypeError: list indices must be integers or slices, not tuple

We can also find out how python represents slices internally:

>>> l[1:3] slice(1, 3, None) [1, 2]

There are lots more fun things you can do with this object -- give it a try!

2. Why don't python lists natively support fancy indexing?

This is hard to answer. One way of thinking about it is historical: because the numpy developers thought of it first.

You youngsters. When I was a kid...

Upon its first public release in 1991, Python had no numpy library, and to make a multi-dimensional list, you had to nest list structures. I assume that the early developers -- in particular, Guido van Rossum (GvR) -- felt that keeping things simple was best, initially. Slice indexing was already pretty powerful.

However, not too long after, interest grew in using Python as a scientific computing language. Between 1995 and 1997, a number of developers collaborated on a library called numeric, an early predecessor of numpy. Though he wasn't a major contributor to numeric or numpy, GvR coordinated with the numeric developers, extending Python's slice syntax in ways that made multidimensional array indexing easier. Later, an alternative to numeric arose called numarray; and in 2006, numpy was created, incorporating the best features of both.

These libraries were powerful, but they required heavy c extensions and so on. Working them into the base Python distribution would have made it bulky. And although GvR did enhance slice syntax a bit, adding fancy indexing to ordinary lists would have changed their API dramatically -- and somewhat redundantly. Given that fancy indexing could be had with an outside library already, the benefit wasn't worth the cost.

Parts of this narrative are speculative, in all honesty.¹ I don't know the developers really! But it's the same decision I would have made. In fact...

It really should be that way.

Although fancy indexing is very powerful, I'm glad it's not part of vanilla Python even today, because it means that you don't have to think very hard when working with ordinary lists. For many tasks you don't need it, and the cognitive load it imposes is significant.

Keep in mind that I'm talking about the load imposed on readers and maintainers. You may be a whiz-bang genius who can do 5-d tensor products in your head, but other people have to read your code. Keeping fancy indexing in numpy means people don't use it unless they honestly need it, which makes code more readable and maintainable in general.

3. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

Possibly. It's definitely environment-dependent; I don't see the same difference on my machine.

^{1. The parts of the narrative that aren't as speculative are drawn from a brief history told in a special issue of Computing in Science and Engineering (2011 vol. 13).}

137

answered Sep 28 '22 08:09

senderle

my_list[:,] is translated by the interpreter into

my_list.__getitem__((slice(None, None, None),))

It's like calling a function with *args, but it takes care of translating the : notation into a slice object. Without the , it would just pass the slice. With the , it passes a tuple.

The list __getitem__ does not accept a tuple, as shown by the error. An array __getitem__ does. I believe the ability to pass a tuple and create slice objects was added as convenience for numpy (or its predicessors). The tuple notation has never been added to the list __getitem__. (There is an operator.itemgetter class that allows a form of advanced indexing, but internally it is just a Python code iterator.)

With an array you can use the tuple notation directly:

In [490]: np.arange(6).reshape((2,3))[:,[0,1]] Out[490]:  array([[0, 1],        [3, 4]]) In [491]: np.arange(6).reshape((2,3))[(slice(None),[0,1])] Out[491]:  array([[0, 1],        [3, 4]]) In [492]: np.arange(6).reshape((2,3)).__getitem__((slice(None),[0,1])) Out[492]:  array([[0, 1],        [3, 4]])

Look at the numpy/lib/index_tricks.py file for example of fun stuff you can do with __getitem__. You can view the file with

np.source(np.lib.index_tricks)

A nested list is a list of lists:

In a nested list, the sublists are independent of the containing list. The container just has pointers to objects elsewhere in memory:

In [494]: my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In [495]: my_list Out[495]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In [496]: len(my_list) Out[496]: 3 In [497]: my_list[1] Out[497]: [4, 5, 6] In [498]: type(my_list[1]) Out[498]: list In [499]: my_list[1]='astring' In [500]: my_list Out[500]: [[1, 2, 3], 'astring', [7, 8, 9]]

Here I change the 2nd item of my_list; it is no longer a list, but a string.

If I apply [:] to a list I just get a shallow copy:

In [501]: xlist = my_list[:] In [502]: xlist[1] = 43 In [503]: my_list           # didn't change my_list Out[503]: [[1, 2, 3], 'astring', [7, 8, 9]] In [504]: xlist Out[504]: [[1, 2, 3], 43, [7, 8, 9]]

but changing an element of a list in xlist does change the corresponding sublist in my_list:

In [505]: xlist[0][1]=43 In [506]: my_list Out[506]: [[1, 43, 3], 'astring', [7, 8, 9]]

To me this shows by n-dimensional indexing (as implemented for numpy arrays) doesn't make sense with nested lists. Nested lists are multidimensional only to the extent that their contents allow; there's nothing structural or syntactically multidimensional about them.

the timings

Using two [:] on a list does not make a deep copy or work its way down the nesting. It just repeats the shallow copy step:

In [507]: ylist=my_list[:][:] In [508]: ylist[0][1]='boo' In [509]: xlist Out[509]: [[1, 'boo', 3], 43, [7, 8, 9]]

arr[:,] just makes a view of arr. The difference between view and copy is part of understanding the difference between basic and advanced indexing.

So alist[:][:] and arr[:,] are different, but basic ways of making some sort of copy of lists and arrays. Neither computes anything, and neither iterates through the elements. So a timing comparison doesn't tell us much.

answered Sep 28 '22 08:09

hpaulj

Related questions
                            
                                How do I measure the execution time of python unit tests with nosetests?
                            
                                python imaplib to get gmail inbox subjects titles and sender name
                            
                                can you recover from reassigning __builtins__ in python?
                            
                                What do (s)witch, (i)gnore, (w)ipe, (b)ackup options mean when installing a package from repository using pip?
                            
                                Why is the dict literal syntax preferred over the dict constructor?
                            
                                How to set the default working directory for run configurations in PyCharm
                            
                                How to change a global variable from within a function?
                            
                                Cut multiple parts of a video with ffmpeg
                            
                                Docker build taking too long when installing grpcio via pip
                            
                                AppEngine warning - OpenBLAS WARNING - could not determine the L2 cache size on this system
                            
                                Django : select_related with ManyToManyField
                            
                                How can you detect if two regular expressions overlap in the strings they can match?
                            
                                How to patch a module's internal functions with mock?
                            
                                how to get the memory address of a numpy array for C
                            
                                Convert SRE_Match object to string
                            
                                Shadows name xyz from outer scope
                            
                                Seaborn / Matplotlib: How to repress scientific notation in factorplot y-axis
                            
                                pandas groupby dropping columns
                            
                                Django: Difference between BASE_DIR and PROJECT_ROOT?
                            
                                How to upload multiple files in django rest framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How is numpy's fancy indexing implemented?

Tags:

python

arrays

indexing

numpy

cs95

People also ask

2 Answers

1. Which `xx` method has numpy overridden/defined to handle fancy indexing?

2. Why don't python lists natively support fancy indexing?

You youngsters. When I was a kid...

It really should be that way.

3. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

senderle

A nested list is a list of lists:

the timings

hpaulj

Recent Activity

Donate For Us

How is numpy's fancy indexing implemented?

Tags:

python

arrays

indexing

numpy

cs95

People also ask

2 Answers

1. Which __xx__ method has numpy overridden/defined to handle fancy indexing?

2. Why don't python lists natively support fancy indexing?

You youngsters. When I was a kid...

It really should be that way.

3. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

senderle

A nested list is a list of lists:

the timings

hpaulj

Related questions

Recent Activity

Donate For Us

1. Which `xx` method has numpy overridden/defined to handle fancy indexing?