Does numpy allocate new matrices for every operation you perform on a matrix? For example: <pre class="prettyprint"><code>A = np.random.rand(10, 20) A = 2 * A # Operation 1: Is a copy of A made, and a reference assigned to A? B = 2 * A # Operation 2: Does B get a completely different copy of A? C = A # Operation 3: Does C get a reference to A? </code></pre> And slice operations: <pre class="prettyprint"><code>A[0, :] = 3 </code></pre> How about chained operations? <pre class="prettyprint"><code>D = A * B * C # Elementwise multiplication, if A * B allocates memory, does # (A * B) * C allocate another patch of memory? </code></pre> Numpy's a fantastic library, but I just want to know what happens under the hood. My intuition says that slice operations modify the memory view in place, but I don't know about assignments.

Keep in mind that a numpy array is a Python object. Python creates and deletes objects continually. The array has attributes shown in the <code>.FLAGS</code> and <code>.__array_interface__</code> dictionaries, things like the <code>shape</code> and <code>dtype</code>. The attribute that takes up (potentially) a lot of memory is the data buffer. It may be a few bytes long, or may be MB. Where possible numpy operations try to avoid copying the data buffer. When indexing, it will return a <code>view</code> if possible. I think the documentation compares views and copies well enough. But views are different from Python references. A shared reference means two variables (or pointers in a list or dictionary) point to the same Python object. A <code>view</code> is a different array object, but one which shares the data buffer with another array. A copy has its own data buffer. In your examples: <pre class="prettyprint"><code>A = np.random.rand(10, 20) </code></pre> <code>A</code> is a variable pointing to an array object. That object has a data buffer with 200 floats (200*8 bytes). <pre class="prettyprint"><code>A = 2 * A # Operation 1: Is a copy of A made, and a reference assigned to A? </code></pre> <code>2*A</code> creates a new object, with a new data buffer. None of its data values can be shared with the original <code>A</code>. <code>A=...</code> reassigns the <code>A</code> variable. The old <code>A</code> object is 'lost', and eventually memory is garbage collected. <pre class="prettyprint"><code>B = 2 * A # Operation 2: Does B get a completely different copy of A? </code></pre> This <code>2*A</code> operates on the new <code>A</code> array. The object is assigned to <code>B</code>. <code>A</code> remains unchanged. <pre class="prettyprint"><code>C = A # Operation 3: Does C get a reference to A? </code></pre> Yes, this is just normal Python assignment. <code>C</code> refers to the same object as <code>A</code>. <code>id(C)==id(A)</code>. <pre class="prettyprint"><code>B = A[1,:] # B is a view </code></pre> <code>B</code> is a reference to a new array object. But that object shares the data buffer with <code>A</code>. That's because the desired values can be found in the buffer by just starting at a different point, and using a different <code>shape</code>. <pre class="prettyprint"><code>A[0, :] = 3 </code></pre> This LHS slice will change a subset of the values of <code>A</code>. It is similar to: <pre class="prettyprint"><code>B = A[0, :] B = 3 </code></pre> But there are subtile differences betwee LHS and RHS slices. On the LHS you have to pay more attention to when you get a copy as opposed to a view. I've seen this especially with expressions like <code>A[idx1,:][:,idx2] = 3</code>. <pre class="prettyprint"><code>D = A * B * C </code></pre> The details of how many intermediate copies are made in a calculation like this are buried in the numpy C code. It's safest to assume that it does something like: <pre class="prettyprint"><code>temp1 = A*B temp2 = temp1*C D = temp2 (temp1 goes to garbage) </code></pre> For ordinary calculations it isn't worth worrying about those details. If you are really pushing for speed you could do a <code>timeit</code> on alternatives. And occasionally we get SO questions about operations giving <code>memory errors</code>. Do a search to get more details on those.

Yes it creates new arrays. Except C. C and A point to same memory. You can test all of this yourself. Try the <code>id(A)</code> command to see where in memory A is pointing. Also, just create a smaller structure and modify parts of it and then see if A, B, and/or C are also updated.

Numpy: Memory allocations on each operation?

Tags:

python

numpy

Does numpy allocate new matrices for every operation you perform on a matrix?

For example:

Click to copy

A = np.random.rand(10, 20)
A = 2 * A  # Operation 1: Is a copy of A made, and a reference assigned to A?
B = 2 * A  # Operation 2: Does B get a completely different copy of A?
C = A      # Operation 3: Does C get a reference to A?

And slice operations:

Click to copy

A[0, :] = 3

How about chained operations?

Click to copy

D = A * B * C  # Elementwise multiplication, if A * B allocates memory, does 
               # (A * B) * C allocate another patch of memory?

Numpy's a fantastic library, but I just want to know what happens under the hood. My intuition says that slice operations modify the memory view in place, but I don't know about assignments.

710

asked Nov 04 '15 17:11

hlin117

2 Answers

Keep in mind that a numpy array is a Python object. Python creates and deletes objects continually. The array has attributes shown in the .FLAGS and .__array_interface__ dictionaries, things like the shape and dtype. The attribute that takes up (potentially) a lot of memory is the data buffer. It may be a few bytes long, or may be MB.

Where possible numpy operations try to avoid copying the data buffer. When indexing, it will return a view if possible. I think the documentation compares views and copies well enough.

But views are different from Python references. A shared reference means two variables (or pointers in a list or dictionary) point to the same Python object. A view is a different array object, but one which shares the data buffer with another array. A copy has its own data buffer.

In your examples:

Click to copy

A = np.random.rand(10, 20)

A is a variable pointing to an array object. That object has a data buffer with 200 floats (200*8 bytes).

Click to copy

A = 2 * A  # Operation 1: Is a copy of A made, and a reference assigned to A?

2*A creates a new object, with a new data buffer. None of its data values can be shared with the original A. A=... reassigns the A variable. The old A object is 'lost', and eventually memory is garbage collected.

Click to copy

B = 2 * A  # Operation 2: Does B get a completely different copy of A?

This 2*A operates on the new A array. The object is assigned to B. A remains unchanged.

Click to copy

C = A      # Operation 3: Does C get a reference to A?

Yes, this is just normal Python assignment. C refers to the same object as A. id(C)==id(A).

Click to copy

B = A[1,:]  #  B is a view

B is a reference to a new array object. But that object shares the data buffer with A. That's because the desired values can be found in the buffer by just starting at a different point, and using a different shape.

Click to copy

A[0, :] = 3

This LHS slice will change a subset of the values of A. It is similar to:

Click to copy

B = A[0, :]
B = 3

But there are subtile differences betwee LHS and RHS slices. On the LHS you have to pay more attention to when you get a copy as opposed to a view. I've seen this especially with expressions like A[idx1,:][:,idx2] = 3.

Click to copy

D = A * B * C

The details of how many intermediate copies are made in a calculation like this are buried in the numpy C code. It's safest to assume that it does something like:

Click to copy

temp1 = A*B
temp2 = temp1*C
D = temp2
(temp1 goes to garbage)

For ordinary calculations it isn't worth worrying about those details. If you are really pushing for speed you could do a timeit on alternatives. And occasionally we get SO questions about operations giving memory errors. Do a search to get more details on those.

198

answered Oct 11 '22 14:10

hpaulj

Yes it creates new arrays. Except C. C and A point to same memory.

You can test all of this yourself. Try the id(A) command to see where in memory A is pointing. Also, just create a smaller structure and modify parts of it and then see if A, B, and/or C are also updated.

answered Oct 11 '22 16:10

RobertB

Related questions
                            
                                Value of Py_None
                            
                                Can the [a-zA-Z] Python regex pattern be made to match and replace non-ASCII Unicode characters?
                            
                                Python all combinations of a list of lists
                            
                                Python inheritance returns attribute error
                            
                                error_perm: 500 Unknown command Python ftplib storbinary
                            
                                How can I override the default string formatter in python?
                            
                                mapping two numpy arrays
                            
                                Printing text in form of circle
                            
                                How to cut a very "deep" json or dictionary in Python?
                            
                                Creating command line application in python using Click
                            
                                prevent unexpected stdin reads and lock in subprocess
                            
                                Cumulative Sum Function on Pandas Data Frame
                            
                                How to mock a SendGrid method in Python
                            
                                How to change multiprocessing shared array size?
                            
                                Can't find module when loading Jupyter Server Extension
                            
                                Scala ctrl c not leave REPL
                            
                                How can I disconnect a django signal?
                            
                                delete node in binary search tree python
                            
                                proper way to align cell in python using openpyxl
                            
                                Getting the number of iterations of scipy's gmres iterative method

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy: Memory allocations on each operation?

Tags:

python

numpy

hlin117

People also ask

2 Answers

hpaulj

RobertB

Recent Activity

Donate For Us