Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python numpy and memory efficiency (pass by reference vs. value)

I've recently been using python more and more in place of c/c++ because of it cuts my coding time by a factor of a few. At the same time, when I'm processing large amounts of data, the speed at which my python programs run starts to become a lot slower than in c. I'm wondering if this is due to me using large objects/arrays inefficiently. Is there any comprehensive guide just to how memory is handled by numpy/python? When things are passed by reference and when by value, when things are copied and when not, what types are mutable and which are not.

like image 812
DilithiumMatrix Avatar asked Jul 26 '13 16:07

DilithiumMatrix


1 Answers

Objects in python (and most mainstream languages) are passed as reference.

If we take numpy, for example, "new" arrays created by indexing existing ones are only views of the original. For example:

import numpy as np

>>> vec_1 = np.array([range(10)])
>>> vec_1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> vec_2 = vec_1[3:] # let vec_2 be vec_1 from the third element untill the end
>>> vec_2
array([3, 4, 5, 6, 7, 8, 9])
>>> vec_2[3] = 10000
array([3, 4, 5, 10000, 7, 8, 9])
>>> vec_1
array([0, 1, 2, 3, 4, 5, 10000, 7, 8, 9])

Numpy have a handy method to help with your questions, called may_share_memory(obj1, obj2). So:

>>> np.may_share_memory(vec_1, vec_2)
True

Just be carefull, because it`s possible for the method to return false positives (Although i never saw one).

At SciPy 2013 there was a tutorial on numpy (http://conference.scipy.org/scipy2013/tutorial_detail.php?id=100). At the end the guy talks a little about how numpy handles memory. Watch it.

As a rule of thumb, objects are almost never passed as value by default. Even the ones encapsulated on another object. Another example, where a list makes a tour:

Class SomeClass():

    def __init__(a_list):
        self.inside_list = a_list

    def get_list(self):
        return self.inside_list

>>> original_list = range(5)
>>> original_list
[0,1,2,3,4]
>>> my_object = SomeClass(original_list)
>>> output_list = my_object.get_list()
>>> output_list
[0,1,2,3,4]
>>> output_list[4] = 10000
>>> output_list
[0,1,2,3,10000]
>>> my_object.original_list
[0,1,2,3,10000]
>>> original_list
[0,1,2,3,10000]

Creepy, huh? Using the assignment symbol ("="), or returning one in the end of a function you will always create a pointer to the object, or a portion of it. Objects are only duplicated when you explicitly do so, using a copy method like some_dict.copy, or array[:]. For example:

>>> original_list = range(5)
>>> original_list
[0,1,2,3,4]
>>> my_object = SomeClass(original_list[:])
>>> output_list = my_object.get_list()
>>> output_list
[0,1,2,3,4]
>>> output_list[4] = 10000
>>> output_list
[0,1,2,3,10000]
>>> my_object.original_list
[0,1,2,3,10000]
>>> original_list
[0,1,2,3,4]

Got it?

like image 71
Lucas Ribeiro Avatar answered Oct 14 '22 19:10

Lucas Ribeiro