Python: create a function to modify a list by reference not value

Tags:

I'm doing some performance-critical Python work and want to create a function that removes a few elements from a list if they meet certain criteria. I'd rather not create any copies of the list because it's filled with a lot of really large objects.

Functionality I want to implement:

def listCleanup(listOfElements):
    i = 0
    for element in listOfElements:
        if(element.meetsCriteria()):
            del(listOfElements[i])
        i += 1
    return listOfElements

myList = range(10000)
myList = listCleanup(listOfElements)

I'm not familiar with the low-level workings of Python. Is myList being passed by value or by reference?

How can I make this faster?

Is it possible to somehow extend the list class and implement listCleanup() within that?

myList = range(10000)
myList.listCleanup()

Thanks-

Jonathan

840

asked May 05 '10 01:05

Jonathan

2 Answers

Python passes everything the same way, but calling it "by value" or "by reference" will not clear everything up, since Python's semantics are different than the languages for which those terms usually apply. If I was to describe it, I would say that all passing was by value, and that the value was an object reference. (This is why I didn't want to say it!)

If you want to filter out some stuff from a list, you build a new list

foo = range(100000)
new_foo = []
for item in foo:
    if item % 3 != 0: # Things divisble by 3 don't get through
        new_foo.append(item)

or, using the list comprehension syntax

 new_foo = [item for item in foo if item % 3 != 0]

Python will not copy the objects in the list, but rather both foo and new_foo will reference the same objects. (Python never implicitly copies any objects.)

You have suggested you have performance concerns about this operation. Using repeated del statements from the old list will result in not code that is less idiomatic and more confusing to deal with, but it will introduce quadratic performance because the whole list must be reshuffled each time.

To address performance:

Get it up and running. You can't figure out what your performance is like unless you have code working. This will also tell you whether it is speed or space that you must optimize for; you mention concerns about both in your code, but oftentimes optimization involves getting one at the cost of the other.
Profile. You can use the stdlib tools for performance in time. There are various third-party memory profilers that can be somewhat useful but aren't quite as nice to work with.
Measure. Time or reprofile memory when you make a change to see if a change makes an improvement and if so what that improvement is.
To make your code more memory-sensitive, you will often want a paradigm shift in how you store your data, not microoptimizastions like not building a second list to do filtering. (The same is true for time, really: changing to a better algorithm will almost always give the best speedup. However, it's harder to generalize about speed optimizations).

Some common paradigm shifts to optimize memory consumption in Python include
1. Using Generators. Generators are lazy iterables: they don't load a whole list into memory at once, they figure out what their next items are on the fly. To use generators, the snippets above would look like
```
foo = xrange(100000) # Like generators, xrange is lazy
def filter_divisible_by_three(iterable):
    for item in foo:
        if item % 3 != 0:
            yield item

new_foo = filter_divisible_by_three(foo)
```
  or, using the generator expression syntax,
```
new_foo = (item for item in foo if item % 3 != 0)
```
2. Using numpy for homogenous sequences, especially ones that are numerical-mathy. This can also speed up code that does lots of vector operations.
3. Storing data to disk, such as in a database.

184

answered Oct 16 '22 04:10

Mike Graham

In Python, lists are always passed by reference.

The size of the objects in the list doesn't affect the lists performance, because the lists only stores references to the objects. However, the number of items in the list does affect the performance of some operations - such as removing an element, which is O(n).

As written, listCleanup is worst-case O(n**2), since you have the O(n) del operation within a loop that is potentially O(n) itself.

If the order of the elements doesn't matter, you may be able to use the built-in set type instead of a list. The set has O(1) deletions and insertions. However, you will have to ensure that your objects are immutable and hashable.

Otherwise, you're better off recreating the list. That's O(n), and your algorithm needs to be at least O(n) since you need to examine every element. You can filter the list in one line like this:

listOfElements[:] = [el for el in listOfElements if el.MeetsCriteria()]

answered Oct 16 '22 04:10

Daniel Stutzbach

Related questions
                            
                                Find http:// and or www. and strip from domain. leaving domain.com
                            
                                Kivy button text alignment issue
                            
                                Contour graph in python
                            
                                flatten list of list through list comprehension
                            
                                Remove \n or \t from a given string
                            
                                In CMD "python" starts Python 3.3, "py" starts Python 2.7, how do I change this?
                            
                                Can't install python mysql library on Mac Mavericks
                            
                                Replace NaN or missing values with rolling mean or other interpolation
                            
                                Using Django's m2m_changed to modify what is being saved pre_add
                            
                                Gunicorn will not bind to my application
                            
                                Control tick labels in Python seaborn package
                            
                                Tabs in print are not consistent python
                            
                                pandas to_datetime parsing wrong year
                            
                                Is there a way to execute jq from python
                            
                                Numpy:zero mean data and standardization
                            
                                matplotlib.scatter() not working with Numpy on Python 3.6
                            
                                Python: pass "not" as a lambda function [duplicate]
                            
                                Practical GUI toolkit?
                            
                                Programming with hardware in python [closed]
                            
                                Python package install using pip or easy_install from repos

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: create a function to modify a list by reference not value

Tags:

python

list

Jonathan

People also ask

2 Answers

Mike Graham

Daniel Stutzbach

Recent Activity

Donate For Us