Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is computing point distances so slow in Python?

My Python program was too slow. So, I profiled it and found that most of the time was being spent in a function that computes distance between two points (a point is a list of 3 Python floats):

def get_dist(pt0, pt1):
    val = 0
    for i in range(3):
        val += (pt0[i] - pt1[i]) ** 2
    val = math.sqrt(val)
    return val

To analyze why this function was so slow, I wrote two test programs: one in Python and one in C++ that do similar computation. They compute the distance between 1 million pairs of points. (The test code in Python and C++ is below.)

The Python computation takes 2 seconds, while C++ takes 0.02 seconds. A 100x difference!

Why is Python code so much slower than C++ code for such simple math computations? How do I speed it up to match the C++ performance?

The Python code used for testing:

import math, random, time

num = 1000000

# Generate random points and numbers

pt_list = []
rand_list = []

for i in range(num):
    pt = []
    for j in range(3):
        pt.append(random.random())
    pt_list.append(pt)
    rand_list.append(random.randint(0, num - 1))

# Compute

beg_time = time.clock()
dist = 0

for i in range(num):
    pt0 = pt_list[i]
    ri  = rand_list[i]
    pt1 = pt_list[ri]

    val = 0
    for j in range(3):
        val += (pt0[j] - pt1[j]) ** 2
    val = math.sqrt(val)

    dist += val

end_time = time.clock()
elap_time = (end_time - beg_time)

print elap_time
print dist

The C++ code used for testing:

#include <cstdlib>
#include <iostream>
#include <ctime>
#include <cmath>

struct Point
{
    double v[3];
};

int num = 1000000;

int main()
{
    // Allocate memory
    Point** pt_list = new Point*[num];
    int* rand_list = new int[num];

    // Generate random points and numbers
    for ( int i = 0; i < num; ++i )
    {
        Point* pt = new Point;

        for ( int j = 0; j < 3; ++j )
        {
            const double r = (double) rand() / (double) RAND_MAX;
            pt->v[j] = r;
        }

        pt_list[i] = pt;
        rand_list[i] = rand() % num;
    }

    // Compute

    clock_t beg_time = clock();
    double dist = 0;
    for ( int i = 0; i < num; ++i )
    {
        const Point* pt0 = pt_list[i];
        int r = rand_list[i];
        const Point* pt1 = pt_list[r];

        double val = 0;
        for ( int j = 0; j < 3; ++j )
        {
            const double d = pt0->v[j] - pt1->v[j];
            val += ( d * d );
        }

        val = sqrt(val);
        dist += val;
    }
    clock_t end_time = clock();
    double sec_time = (end_time - beg_time) / (double) CLOCKS_PER_SEC;

    std::cout << sec_time << std::endl;
    std::cout << dist << std::endl;

    return 0;
}
like image 485
Ashwin Nanjappa Avatar asked Apr 25 '13 09:04

Ashwin Nanjappa


People also ask

Why is Python slow in math?

The code is interpreted which introduces overhead on every operation. Not only that but Python can't assume the objects are numbers, it must inspect them to even know which operation to perform. For example you can use + on two numbers to add them, or you can use + on two strings to concatenate them.

How does Python calculate distance?

The math. dist() method returns the Euclidean distance between two points (p and q), where p and q are the coordinates of that point.

Are Python programs slow?

"Python is widely acknowledged as slow.

How does Python calculate 3D distance?

To calculate the distance between two points in 3D in Python, use the math. dist() method.


2 Answers

A sequence of optimizations:

The original code, with small changes

import math, random, time

num = 1000000

# Generate random points and numbers

# Change #1: Sometimes it's good not to have too much randomness.
# This is one of those cases.
# Changing the code shouldn't change the results.
# Using a fixed seed ensures that the changes are valid.
# The final 'print dist' should yield the same result regardless of optimizations.
# Note: There's nothing magical about this seed.
# I randomly picked a hash tag from a git log.
random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)
pt_list = []
rand_list = []

for i in range(num):
    pt = []
    for j in range(3):
        pt.append(random.random())
    pt_list.append(pt)

# Change #2: rand_list is computed in a separate loop.
# This ensures that upcoming optimizations will get the same results as
# this unoptimized version.
for i in range(num):
    rand_list.append(random.randint(0, num - 1))

# Compute

beg_time = time.clock()
dist = 0

for i in range(num):
    pt0 = pt_list[i]
    ri  = rand_list[i]
    pt1 = pt_list[ri]

    val = 0
    for j in range(3):
        val += (pt0[j] - pt1[j]) ** 2
    val = math.sqrt(val)

    dist += val

end_time = time.clock()
elap_time = (end_time - beg_time)

print elap_time
print dist


Optimization #1: Put the code in a function.

The first optimization (not shown) is to embed all of the code except the import in a function. This simple change offers a 36% performance boost on my computer.


Optimization #2: Eschew the ** operator.

You don't use pow(d,2) in your C code because everyone knows that this is suboptimal in C. It's also suboptimal in python. Python's ** is smart; it evaluates x**2 as x*x. However, it takes time to be smart. You know you want d*d, so use it. Here's the computation loop with that optimization:

for i in range(num):
    pt0 = pt_list[i]
    ri  = rand_list[i]
    pt1 = pt_list[ri]

    val = 0 
    for j in range(3):
        d = pt0[j] - pt1[j]
        val += d*d 
    val = math.sqrt(val)

    dist += val 


Optimization #3: Be pythonic.

Your Python code looks a whole lot like your C code. You aren't taking advantage of the language.

import math, random, time, itertools

def main (num=1000000) :
    # This small optimization speeds things up by a couple percent.
    sqrt = math.sqrt

    # Generate random points and numbers

    random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)

    def random_point () :
        return [random.random(), random.random(), random.random()]

    def random_index () :
       return random.randint(0, num-1)

    # Big optimization:
    # Don't generate the lists of points.
    # Instead use list comprehensions that create iterators.
    # It's best to avoid creating lists of millions of entities when you don't
    # need those lists. You don't need the lists; you just need the iterators.
    pt_list = [random_point() for i in xrange(num)]
    rand_pts = [pt_list[random_index()] for i in xrange(num)]


    # Compute

    beg_time = time.clock()
    dist = 0 

    # Don't loop over a range. That's too C-like.
    # Instead loop over some iterable, preferably one that doesn't create the
    # collection over which the iteration is to occur.
    # This is particularly important when the collection is large.
    for (pt0, pt1) in itertools.izip (pt_list, rand_pts) :

        # Small optimization: inner loop inlined,
        # intermediate variable 'val' eliminated.
        d0 = pt0[0]-pt1[0]
        d1 = pt0[1]-pt1[1]
        d2 = pt0[2]-pt1[2]

        dist += sqrt(d0*d0 + d1*d1 + d2*d2)

    end_time = time.clock()
    elap_time = (end_time - beg_time)

    print elap_time
    print dist


Update

Optimization #4, Use numpy

The following takes about 1/40th the time of the original version in the timed section of the code. Not quite as fast as C, but close.

Note the commented out, "Mondo slow" computation. That takes about ten times as long as the original version. There is an overhead cost with using numpy. The setup takes quite a bit longer in the code that follows compared to that in my non-numpy optimization #3.

Bottom line: You need to take care when using numpy, and the setup costs might be significant.

import numpy, random, time

def main (num=1000000) :

    # Generate random points and numbers

    random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)

    def random_point () :
        return [random.random(), random.random(), random.random()]

    def random_index () :
       return random.randint(0, num-1)

    pt_list = numpy.array([random_point() for i in xrange(num)])
    rand_pts = pt_list[[random_index() for i in xrange(num)],:]

    # Compute

    beg_time = time.clock()

    # Mondo slow.
    # dist = numpy.sum (
    #            numpy.apply_along_axis (
    #                numpy.linalg.norm, 1, pt_list - rand_pts))

    # Mondo fast.
    dist = numpy.sum ((numpy.sum ((pt_list-rand_pts)**2, axis=1))**0.5)

    end_time = time.clock()
    elap_time = (end_time - beg_time)

    print elap_time
    print dist
like image 152
David Hammen Avatar answered Sep 27 '22 21:09

David Hammen


Some general hints:

Move all your code into a main() function and use the normal

if __name__ == "__main__":
    main()

construct. It greatly improves speed due to variable-scoping. See Why does Python code run faster in a function? for an explanation of why.

Don't use range() since it generates the complete range at once which is slow for large numbers; instead use xrange() which uses a generator.

like image 40
Fredrik Pihl Avatar answered Sep 27 '22 20:09

Fredrik Pihl