How do I make this list function faster?

Tags:

def removeDuplicatesFromList(seq): 
    # Not order preserving 
    keys = {}
    for e in seq:
        keys[e] = 1
    return keys.keys()

def countWordDistances(li):
    '''
    If li = ['that','sank','into','the','ocean']    
    This function would return: { that:1, sank:2, into:3, the:4, ocean:5 }
    However, if there is a duplicate term, take the average of their positions
    '''
    wordmap = {}
    unique_words = removeDuplicatesFromList(li)
    for w in unique_words:
        distances = [i+1 for i,x in enumerate(li) if x == w]
        wordmap[w] = float(sum(distances)) / float(len(distances)) #take average
    return wordmap

How do I make this function faster?

383

asked Jul 18 '11 04:07

user849364

1 Answers

import collections
def countWordDistances(li):
    wordmap = collections.defaultdict(list)
    for i, w in enumerate(li, 1):
        wordmap[w].append(i)
    for k, v in wordmap.iteritems():
        wordmap[k] = sum(v)/float(len(v))

    return wordmap

This makes only one pass through the list, and keeps operations to a minimum. I timed this on a word list with 1.1M entries, 29k unique words, and it was almost twice as fast as Patrick's answer. On a list of 10k words, 2k unique, it was more than 300x faster than the OP's code.

To make Python code go faster, there are two rules to keep in mind: use the best algorithm, and avoid Python.

On the algorithm front, iterating the list once instead of N+1 times (N= number of unique words) is the main thing that will speed this up.

On the "avoid Python" front, I mean: you want your code to be executing in C as much as possible. So using defaultdict is better than a dict where you explicitly check if the key is present. defaultdict does that check for you, but does it in C, in the Python implementation. enumerate is better than for i in range(len(li)), again because it's fewer Python steps. And enumerate(li, 1) makes the counting start at 1 instead of having to have a Python +1 somewhere in the loop.

Edited: Third rule: use PyPy. My code goes twice as fast on PyPy as on 2.7.

answered Sep 30 '22 13:09

Ned Batchelder

Related questions
                            
                                Trailing slashes in Pylons Routes
                            
                                How to synthesize sounds?
                            
                                How to create single Python dict from a list of dicts by summing values with common keys?
                            
                                Array division- translating from MATLAB to Python
                            
                                How to detect the country and city of a user accessing your site?
                            
                                Ctypes pro and con
                            
                                Telnet automation / scripting [closed]
                            
                                Django: How do I make fields non-editable by default in an inline model formset?
                            
                                How do I get the string representation of a variable in python?
                            
                                If I have this string in Python, how do I decode it?
                            
                                C++ - How to read Unicode characters( Hindi Script for e.g. ) using C++ or is there a better Way through some other programming language?
                            
                                How do you determine which file is imported in Python with an "import" statement?
                            
                                what does python.exe take as arguments?
                            
                                Python - Timeit within a class
                            
                                Computing greatest common denominator in python
                            
                                Python set intersection question
                            
                                Assigning Multiple Cores to a Python Program
                            
                                How to get twitter followers using Twython?
                            
                                Resize the terminal with Python?
                            
                                Running Python script from IDLE on Windows 7 64 bit

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I make this list function faster?

Tags:

python

dictionary

algorithm

list

optimization

user849364

People also ask

1 Answers

Ned Batchelder

Recent Activity

Donate For Us