Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy List Comprehension Syntax

Tags:

python

numpy

I'd like to be able to use list comprehension syntax to work with NumPy arrays easily.

For instance, I would like something like the below obviously wrong code to just reproduce the same array.

>>> X = np.random.randn(8,4)
>>> [[X[i,j] for i in X] for j in X[i]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type

What is the easy way to do this, to avoid using range(len(X)?

like image 910
Andrew Latham Avatar asked Jan 26 '14 05:01

Andrew Latham


People also ask

What is the syntax for list comprehension in Python?

A Python list comprehension consists of brackets containing the expression, which is executed for each element along with the for loop to iterate over each element in the Python list. Python List comprehension provides a much more short syntax for creating a new list based on the values of an existing list.

Can you do list comprehension with NumPy?

Therefore, when working with NumPy, remember that you can also work with List Comprehension.

What is the correct form of using a list comprehension?

List comprehension is an elegant way to define and create lists based on existing lists. List comprehension is generally more compact and faster than normal functions and loops for creating list. However, we should avoid writing very long list comprehensions in one line to ensure that code is user-friendly.


2 Answers

First, you should not be using NumPy arrays as lists of lists.

Second, let's forget about NumPy; your listcomp doesn't make any sense in the first place, even for lists of lists.

In the inner comprehension, for i in X is going to iterate over the rows in X. Those rows aren't numbers, they're lists (or, in NumPy, 1D arrays), so X[i] makes no sense whatsoever. You may have wanted i[j] instead.

In the outer comprehension, for j in X[i] has the same problem, but is has an even bigger problem: there is no i value. You have a comprehension looping over each i inside this comprehension.

If you're confused by a comprehension, write it out as an explicit for statement, as explained in the tutorial section on List Comprehensions:

tmp = []
for j in X[i]:
    tmp.append([X[i,j] for i in X])

… which expands to:

tmp = []
for j in X[i]:
    tmp2 = []
    for i in X:
        tmp2.append(X[i,j])
    tmp.append(tmp2)

… which should make it obvious what's wrong here.


I think what you wanted was:

[[cell for cell in row] for row in X]

Again, turn it back into explicit for statements:

tmp = []
for row in X;
    tmp2 = []
    for cell in row:
        tmp2.append(cell)
    tmp.append(tmp2)

That's obviously right.

Or, if you really want to use indexing (but you don't):

[[X[i][j] for j in range(len(X[i]))] for i in range(len(X))]

So, back to NumPy. In NumPy terms, that last version is:

[[X[i,j] for j in range(X.shape[1])] for i in range(X.shape[0])]

… and if you want to go in column-major order instead of row-major, you can (unlike with a list of lists):

[[X[i,j] for i in range(X.shape[0])] for j in range(X.shape[1])]

… but that will of course transpose the array, which isn't what you wanted to do.

The one thing you can't do is mix up column-major and row-major order in the same expression, because you end up with nonsense.


Of course the right way to make a copy of an array is to use the copy method:

X.copy()

Just as the right way to transpose an array is:

X.T
like image 154
abarnert Avatar answered Oct 18 '22 03:10

abarnert


The easy way is to not do this. Use numpy's implicit vectorization instead. For example, if you have arrays A and B as follows:

A = numpy.array([[1, 3, 5],
                 [2, 4, 6],
                 [9, 8, 7]])
B = numpy.array([[5, 3, 5],
                 [3, 5, 3],
                 [5, 3, 5]])

then the following code using list comprehensions:

C = numpy.array([[A[i, j] * B[i, j] for j in xrange(A.shape[1])]
                 for i in xrange(A.shape[0])])

can be much more easily written as

C = A * B

It'll also run much faster. Generally, you will produce faster, clearer code if you don't use list comprehensions with numpy than if you do.

If you really want to use list comprehensions, standard Python list-comprehension-writing techniques apply. Iterate over the elements, not the indices:

C = numpy.array([[a*b for a, b in zip(a_row, b_row)]
                 for a_row, b_row in zip(A, B)]

Thus, your example code would become

numpy.array([[elem for elem in x_row] for x_row in X])
like image 15
user2357112 supports Monica Avatar answered Oct 18 '22 02:10

user2357112 supports Monica