Why is computing point distances so slow in Python?

Tags:

My Python program was too slow. So, I profiled it and found that most of the time was being spent in a function that computes distance between two points (a point is a list of 3 Python floats):

def get_dist(pt0, pt1):
    val = 0
    for i in range(3):
        val += (pt0[i] - pt1[i]) ** 2
    val = math.sqrt(val)
    return val

To analyze why this function was so slow, I wrote two test programs: one in Python and one in C++ that do similar computation. They compute the distance between 1 million pairs of points. (The test code in Python and C++ is below.)

The Python computation takes 2 seconds, while C++ takes 0.02 seconds. A 100x difference!

Why is Python code so much slower than C++ code for such simple math computations? How do I speed it up to match the C++ performance?

The Python code used for testing:

import math, random, time

num = 1000000

# Generate random points and numbers

pt_list = []
rand_list = []

for i in range(num):
    pt = []
    for j in range(3):
        pt.append(random.random())
    pt_list.append(pt)
    rand_list.append(random.randint(0, num - 1))

# Compute

beg_time = time.clock()
dist = 0

for i in range(num):
    pt0 = pt_list[i]
    ri  = rand_list[i]
    pt1 = pt_list[ri]

    val = 0
    for j in range(3):
        val += (pt0[j] - pt1[j]) ** 2
    val = math.sqrt(val)

    dist += val

end_time = time.clock()
elap_time = (end_time - beg_time)

print elap_time
print dist

The C++ code used for testing:

#include <cstdlib>
#include <iostream>
#include <ctime>
#include <cmath>

struct Point
{
    double v[3];
};

int num = 1000000;

int main()
{
    // Allocate memory
    Point** pt_list = new Point*[num];
    int* rand_list = new int[num];

    // Generate random points and numbers
    for ( int i = 0; i < num; ++i )
    {
        Point* pt = new Point;

        for ( int j = 0; j < 3; ++j )
        {
            const double r = (double) rand() / (double) RAND_MAX;
            pt->v[j] = r;
        }

        pt_list[i] = pt;
        rand_list[i] = rand() % num;
    }

    // Compute

    clock_t beg_time = clock();
    double dist = 0;
    for ( int i = 0; i < num; ++i )
    {
        const Point* pt0 = pt_list[i];
        int r = rand_list[i];
        const Point* pt1 = pt_list[r];

        double val = 0;
        for ( int j = 0; j < 3; ++j )
        {
            const double d = pt0->v[j] - pt1->v[j];
            val += ( d * d );
        }

        val = sqrt(val);
        dist += val;
    }
    clock_t end_time = clock();
    double sec_time = (end_time - beg_time) / (double) CLOCKS_PER_SEC;

    std::cout << sec_time << std::endl;
    std::cout << dist << std::endl;

    return 0;
}

485

asked Apr 25 '13 09:04

Ashwin Nanjappa

2 Answers

A sequence of optimizations:

The original code, with small changes

import math, random, time

num = 1000000

# Generate random points and numbers

# Change #1: Sometimes it's good not to have too much randomness.
# This is one of those cases.
# Changing the code shouldn't change the results.
# Using a fixed seed ensures that the changes are valid.
# The final 'print dist' should yield the same result regardless of optimizations.
# Note: There's nothing magical about this seed.
# I randomly picked a hash tag from a git log.
random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)
pt_list = []
rand_list = []

for i in range(num):
    pt = []
    for j in range(3):
        pt.append(random.random())
    pt_list.append(pt)

# Change #2: rand_list is computed in a separate loop.
# This ensures that upcoming optimizations will get the same results as
# this unoptimized version.
for i in range(num):
    rand_list.append(random.randint(0, num - 1))

# Compute

beg_time = time.clock()
dist = 0

for i in range(num):
    pt0 = pt_list[i]
    ri  = rand_list[i]
    pt1 = pt_list[ri]

    val = 0
    for j in range(3):
        val += (pt0[j] - pt1[j]) ** 2
    val = math.sqrt(val)

    dist += val

end_time = time.clock()
elap_time = (end_time - beg_time)

print elap_time
print dist

Optimization #1: Put the code in a function.

The first optimization (not shown) is to embed all of the code except the import in a function. This simple change offers a 36% performance boost on my computer.

Optimization #2: Eschew the `**` operator.

You don't use pow(d,2) in your C code because everyone knows that this is suboptimal in C. It's also suboptimal in python. Python's ** is smart; it evaluates x**2 as x*x. However, it takes time to be smart. You know you want d*d, so use it. Here's the computation loop with that optimization:

for i in range(num):
    pt0 = pt_list[i]
    ri  = rand_list[i]
    pt1 = pt_list[ri]

    val = 0 
    for j in range(3):
        d = pt0[j] - pt1[j]
        val += d*d 
    val = math.sqrt(val)

    dist += val

Optimization #3: Be pythonic.

Your Python code looks a whole lot like your C code. You aren't taking advantage of the language.

import math, random, time, itertools

def main (num=1000000) :
    # This small optimization speeds things up by a couple percent.
    sqrt = math.sqrt

    # Generate random points and numbers

    random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)

    def random_point () :
        return [random.random(), random.random(), random.random()]

    def random_index () :
       return random.randint(0, num-1)

    # Big optimization:
    # Don't generate the lists of points.
    # Instead use list comprehensions that create iterators.
    # It's best to avoid creating lists of millions of entities when you don't
    # need those lists. You don't need the lists; you just need the iterators.
    pt_list = [random_point() for i in xrange(num)]
    rand_pts = [pt_list[random_index()] for i in xrange(num)]


    # Compute

    beg_time = time.clock()
    dist = 0 

    # Don't loop over a range. That's too C-like.
    # Instead loop over some iterable, preferably one that doesn't create the
    # collection over which the iteration is to occur.
    # This is particularly important when the collection is large.
    for (pt0, pt1) in itertools.izip (pt_list, rand_pts) :

        # Small optimization: inner loop inlined,
        # intermediate variable 'val' eliminated.
        d0 = pt0[0]-pt1[0]
        d1 = pt0[1]-pt1[1]
        d2 = pt0[2]-pt1[2]

        dist += sqrt(d0*d0 + d1*d1 + d2*d2)

    end_time = time.clock()
    elap_time = (end_time - beg_time)

    print elap_time
    print dist

Update

Optimization #4, Use numpy

The following takes about 1/40th the time of the original version in the timed section of the code. Not quite as fast as C, but close.

Note the commented out, "Mondo slow" computation. That takes about ten times as long as the original version. There is an overhead cost with using numpy. The setup takes quite a bit longer in the code that follows compared to that in my non-numpy optimization #3.

Bottom line: You need to take care when using numpy, and the setup costs might be significant.

import numpy, random, time

def main (num=1000000) :

    # Generate random points and numbers

    random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)

    def random_point () :
        return [random.random(), random.random(), random.random()]

    def random_index () :
       return random.randint(0, num-1)

    pt_list = numpy.array([random_point() for i in xrange(num)])
    rand_pts = pt_list[[random_index() for i in xrange(num)],:]

    # Compute

    beg_time = time.clock()

    # Mondo slow.
    # dist = numpy.sum (
    #            numpy.apply_along_axis (
    #                numpy.linalg.norm, 1, pt_list - rand_pts))

    # Mondo fast.
    dist = numpy.sum ((numpy.sum ((pt_list-rand_pts)**2, axis=1))**0.5)

    end_time = time.clock()
    elap_time = (end_time - beg_time)

    print elap_time
    print dist

152

answered Sep 27 '22 21:09

David Hammen

Some general hints:

Move all your code into a main() function and use the normal

if __name__ == "__main__":
    main()

construct. It greatly improves speed due to variable-scoping. See Why does Python code run faster in a function? for an explanation of why.

Don't use range() since it generates the complete range at once which is slow for large numbers; instead use xrange() which uses a generator.

answered Sep 27 '22 20:09

Fredrik Pihl

Related questions
                            
                                how to install libmemcached for django framework in ubuntu 10.04
                            
                                Traversing a sequence of generators
                            
                                Python: Opening a file without creating a lock
                            
                                Python 3: When to use dict, when list of tuples?
                            
                                ignore characters during .sort python
                            
                                Python py2exe window showing (tkinter)
                            
                                Rabin-Miller Strong Pseudoprime Test Implementation won't work
                            
                                Value Error when Slicing in Pandas
                            
                                Get the return value from a function in a class in Python
                            
                                How to get HTTP return code from python urllib's urlopen?
                            
                                True for all characters of a string
                            
                                How to implement a particle engine
                            
                                Copy the last three lines of a text file in python?
                            
                                sys.stdin does not close on ctrl-d
                            
                                Evaluation order of dictionary literals
                            
                                Efficient way to create list of same numbers? [duplicate]
                            
                                Is there a "freq" function in numpy/python? [duplicate]
                            
                                How to remove all 0 from a dictionary in python? [closed]
                            
                                getting an error while using time sleep method in python
                            
                                Detecting computer/program shutdown in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is computing point distances so slow in Python?

Tags:

performance

python

Ashwin Nanjappa

People also ask

2 Answers

The original code, with small changes

Optimization #1: Put the code in a function.

Optimization #2: Eschew the `**` operator.

Optimization #3: Be pythonic.

Update

Optimization #4, Use numpy

David Hammen

Fredrik Pihl

Recent Activity

Donate For Us

Why is computing point distances so slow in Python?

Tags:

performance

python

Ashwin Nanjappa

People also ask

2 Answers

The original code, with small changes

Optimization #1: Put the code in a function.

Optimization #2: Eschew the ** operator.

Optimization #3: Be pythonic.

Update

Optimization #4, Use numpy

David Hammen

Fredrik Pihl

Related questions

Recent Activity

Donate For Us

Optimization #2: Eschew the `**` operator.