When I am doing the slicing, an unexpected thing happened that seems the first to be view but the second is copy. <h3>First</h3> First slice of row, then slice of column. It seems is a view. <pre class="prettyprint"><code>>>> a = np.arange(12).reshape(3, 4) >>> a[0:3:2, :][:, [0, 2]] = 100 >>> a array([[100, 1, 100, 3], [ 4, 5, 6, 7], [100, 9, 100, 11]]) </code></pre> <h3>Second</h3> But if I first slice of column, then slice of row, it seems a copy: <pre class="prettyprint"><code>>>> a[:, [0, 2]][0:3:2, :] = 0 >>> a array([[100, 1, 100, 3], [ 4, 5, 6, 7], [100, 9, 100, 11]]) </code></pre> I am confused because the two methods finally will cause seem position to change, but why the second actually doesn't change the number?

All that matters is whether you slice by rows or by columns. Slicing by rows can return a view because it is a contiguous segment of the original array. Slicing by column must return a copy because it is not a contiguous segment. For example: <pre class="prettyprint"><code>A1 A2 A3 B1 B2 B3 C1 C2 C3 </code></pre> By default, it is stored in memory this way: <pre class="prettyprint"><code>A1 A2 A3 B1 B2 B3 C1 C2 C3 </code></pre> So if you want to choose every second row, it is: <pre class="prettyprint"><code>[A1 A2 A3] B1 B2 B3 [C1 C2 C3] </code></pre> That can be described as <code>{start: 0, size: 3, stride: 6}</code>. But if you want to choose every second column: <pre class="prettyprint"><code>[A1] A2 [A3 B1] B2 [B3 C1] C2 [C3] </code></pre> And there is no way to describe that using a single start, size, and stride. So there is no way to construct such a view. If you want to be able to view every second column instead of every second row, you can construct your array in column-major aka Fortran order instead: <pre class="prettyprint"><code>np.array(a, order='F') </code></pre> Then it will be stored as such: <pre class="prettyprint"><code>A1 B1 C1 A2 B2 C2 A3 B3 C3 </code></pre>

Numpy: views vs copy by slicing

First

First slice of row, then slice of column. It seems is a view.

>>> a = np.arange(12).reshape(3, 4)    >>> a[0:3:2, :][:, [0, 2]] = 100 >>> a array([[100,   1, 100,   3],        [  4,   5,   6,   7],        [100,   9, 100,  11]])

Second

But if I first slice of column, then slice of row, it seems a copy:

>>> a[:, [0, 2]][0:3:2, :] = 0 >>> a array([[100,   1, 100,   3],        [  4,   5,   6,   7],        [100,   9, 100,  11]])

I am confused because the two methods finally will cause seem position to change, but why the second actually doesn't change the number?

581

asked Nov 08 '17 13:11

Caesium

2 Answers

The accepted answer by John Zwinck is actually false (I just figured this out the hard way!). The problem in the question is a combination of doing "l-value indexing" with numpy's fancy indexing. The following doc explains exactly this case

https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html

in the section "But fancy indexing does seem to return views sometimes, doesn't it?"

Edit:

To summarize the above link:

Whether a view or a copy is created is determined by whether the indexing can be represented as a slice.

Exception: If one does "fancy indexing" then always a copy is created. Fancy indexing is something like a[[1,2]].

Exception to the exception: If one does l-value indexing (i.e. the indexing happens left of the = sign), then the rule for when a view or a copy are created doesn't apply anymore (though see below for a further exception). The python interpreter will directly assign the values to the left hand side without creating a copy or a view.

To prove that a copy is created in both cases, you can do the operation in two steps:

>>> a = np.arange(12).reshape(3, 4) >>> b = a[0:3:2, :][:, [0, 2]] >>> b[:] = 100 >>> a array([[ 0,  1,  2,  3],        [ 4,  5,  6,  7],        [ 8,  9, 10, 11]])

and

>>> b = a[:, [0, 2]][0:3:2, :] >>> b[:] = 0 >>> a array([[ 0,  1,  2,  3],        [ 4,  5,  6,  7],        [ 8,  9, 10, 11]])

Just as an aside, the question by the original poster is the exact problem stated at the end of the scipy-cookbook link above. There is no solution given in the book. The tricky thing about the question is that there are two indexing operations done in a row.

Exception to the exception to the exception: If there are two indexing operations done in a row on the left hand side (as is the case in this question), the direct assignment in l-value indexing only works if the first indexing operation can be represented as a slice. Otherwise a copy has to be created even though it is l-value indexing.

answered Sep 28 '22 02:09

Maltimore

All that matters is whether you slice by rows or by columns. Slicing by rows can return a view because it is a contiguous segment of the original array. Slicing by column must return a copy because it is not a contiguous segment. For example:

A1 A2 A3 B1 B2 B3 C1 C2 C3

By default, it is stored in memory this way:

A1 A2 A3 B1 B2 B3 C1 C2 C3

So if you want to choose every second row, it is:

[A1 A2 A3] B1 B2 B3 [C1 C2 C3]

That can be described as {start: 0, size: 3, stride: 6}.

But if you want to choose every second column:

[A1] A2 [A3 B1] B2 [B3 C1] C2 [C3]

And there is no way to describe that using a single start, size, and stride. So there is no way to construct such a view.

If you want to be able to view every second column instead of every second row, you can construct your array in column-major aka Fortran order instead:

np.array(a, order='F')

Then it will be stored as such:

A1 B1 C1 A2 B2 C2 A3 B3 C3

answered Sep 28 '22 02:09

John Zwinck

Related questions
                            
                                How to interpret TensorFlow output?
                            
                                Sympy - Comparing expressions
                            
                                Replace all occurrences that match regular expression
                            
                                OSError: [Errno 8] Exec format error selenium
                            
                                Pandas populate new dataframe column based on matching columns in another dataframe
                            
                                Export pandas Styled table to image file
                            
                                What exactly is the definition of a 'Module' in PyTorch?
                            
                                What is the recommended way to break long if statement? (W504 line break after binary operator)
                            
                                OpenCV Image Processing -- C++ vs C vs Python
                            
                                How to calculate the statistics "t-test" with numpy
                            
                                Django Storage Backend for S3
                            
                                What is the scope of a random seed in Python?
                            
                                Convert "unknown format" strings to datetime objects?
                            
                                Factory method for objects - best practice?
                            
                                How to hide .pyc files when you enter `ls` in bash
                            
                                Django: Error: Unknown command: 'makemigrations'
                            
                                python logging: how to ensure logfile directory is created?
                            
                                NumPy "record array" or "structured array" or "recarray"
                            
                                Preserve Dataframe column data type after outer merge
                            
                                What's the best way to test whether an sklearn model has been fitted?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy: views vs copy by slicing

Tags:

python

slice

numpy