Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to add an element to a list only if isn't there yet?

I have the following code in Python:

def point_to_index(point):
    if point not in points:
        points.append(point)
    return points.index(point)

This code is awfully inefficient, especially since I expect points to grow to hold a few million elements.

If the point isn't in the list, I traverse the list 3 times:

  1. look for it and decide it isn't there
  2. go to the end of the list and add a new element
  3. go to the end of the list until I find the index

If it is in the list, I traverse it twice: 1. look for it and decide it is there 2. go almost to the end of the list until I find the index

Is there any more efficient way to do this? For instance, I know that:

  • I'm more likely to call this function with a point that isn't in the list.
  • If the point is in the list, it's likelier to be near the end than in the beginning.

So if I could have the line:

if point not in points:

search the list from the end to the beginning it would improve performance when the point is already in the list.

However, I don't want to do:

if point not in reversed(points):

because I imagine that reversed(points) itself will come at a huge cost.

Nor do I want to add new points to the beginning of the list (assuming I knew how to do that in Python) because that would change the indices, which must remain constant for the algorithm to work.

The only improvement I can think of is to implement the function with only one pass, if possible from the end to the beginning. The bottom line is:

  • Is there a good way to do this?
  • Is there a better way to optimize the function?

Edit: I've gotten suggestions for implementing this with only one pass. Is there any way for index() to go from the end to the beginning?

Edit: People have asked why the index is critical. I'm trying to describe a 3D surface using the OFF file format. This format describes a surface using its vertices and faces. First the vertices are listed, and the faces are described using a list of indices of vertices. That's why once I add a vortex to the list, its index must not change.

Edit: There have been some suggestions (such as igor's) to use a dict. This is a good solution for scanning the list. However, when I'm done I need to print out the list in the same order it was created. If I use a dict, I need to print out its keys sorted by value. Is there a good way to do that?

Edit: I implemented www.brool.com's suggestion. This was the simplest and fastest. It is essentially an ordered Dict, but without the overhead. The performance is great!

like image 861
Nathan Fellman Avatar asked Aug 23 '09 18:08

Nathan Fellman


Video Answer


2 Answers

You want to use a set:

>>> x = set()
>>> x
set([])
>>> x.add(1)
>>> x
set([1])
>>> x.add(1)
>>> x
set([1])

A set contains only one instance of any item you add, and it will be a lot more efficient than iterating a list manually.

This wikibooks page looks like a good primer if you haven't used sets in Python before.

like image 149
Mark Rushakoff Avatar answered Sep 17 '22 19:09

Mark Rushakoff


This will traverse at most once:

def point_to_index(point):
    try: 
        return points.index(point)
    except ValueError:
        points.append(point)
        return len(points)-1

You may also want to try this version, which takes into account that matches are likely to be near the end of the list. Note that reversed() has almost no cost even on very large lists - it does not create a copy and does not traverse the list more than once.

def point_to_index(point):
    for index, this_point in enumerate(reversed(points)):
        if point == this_point:
            return len(points) - (index+1)
    else:
        points.append(point)
        return len(points)-1

You might also consider keeping a parallel dict or set of points to check for membership, since both of those types can do membership tests in O(1). There would be, of course, a substantial memory cost.

Obviously, if the points were ordered somehow, you would have many other options for speeding this code up, notably using a binary search for membership tests.

like image 40
Kenan Banks Avatar answered Sep 18 '22 19:09

Kenan Banks