Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python optimisations in this code?

I have two fairly simple code snippets and I'm running both of them a very large amount of times; I'm trying to determine if there's any optimisation I can do to speed up the execution time. If there's anything that stands out as something that could be done a lot quicker...

In the first one, we've got a list, fields. We've also got a list of lists, weights. We're trying to find which weight list multiplied by fields will produce the maximum sum. Fields is about 30k entries long.

def find_best(weights,fields):
  winner = -1
  best = -float('inf')
  for c in range(num_category):
    score = 0
    for i in range(num_fields):
      score += float(fields[i]) * weights[c][i]
    if score > best:
      best = score
      winner = c
  return winner

In the second one, we're trying to update two of our weight lists; one gets increased and one decreased. The amount to increase/decrease each element in the is equal to the corresponding element in fields (e.g. if fields[4] = 10.5, then we want to increase weights[toincrease][4] by 10.5 and decrease weights[todecrease][4] by 10.5)

 def update_weights(weights,fields,toincrease,todecrease):
   for i in range(num_fields):
     update = float(fields[i])
     weights[toincrease][i] += update
     weights[todecrease][i] -= update
   return weights

I hope this isn't an overly specific question.

like image 301
Fergusmac Avatar asked Nov 30 '22 14:11

Fergusmac


1 Answers

When you are trying to optimise, the thing you have to do is profile and measure! Python provides the timeit module which makes measuring things easy!

This will assume that you've converted fields to a list of floats beforehand (outside any of these functions), since the string → float conversion is very slow. You can do this via fields = [float(f) for f in string_fields].

Also, for doing numerical processing, pure python isn't very good, since it ends up doing a lot of type-checking (and some other stuff) for each operation. Using a C library like numpy will give massive improvements.

find_best

I have incorporated the answers of others (and a few more) into a profiling suite (say, test_find_best.py):

import random, operator, numpy as np, itertools, timeit

fields = [random.random() for _ in range(3000)]
fields_string = [str(field) for field in fields]
weights = [[random.random() for _ in range(3000)] for c in range(100)]

npw = np.array(weights)
npf = np.array(fields)   

num_fields = len(fields)
num_category = len(weights)

def f_original():
  winner = -1
  best = -float('inf')
  for c in range(num_category):
    score = 0
    for i in range(num_fields):
      score += float(fields_string[i]) * weights[c][i]
    if score > best:
      best = score
      winner = c

def f_original_no_string():
  winner = -1
  best = -float('inf')
  for c in range(num_category):
    score = 0
    for i in range(num_fields):
      score += fields[i] * weights[c][i]
    if score > best:
      best = score
      winner = c

def f_original_xrange():
  winner = -1
  best = -float('inf')
  for c in xrange(num_category):
    score = 0
    for i in xrange(num_fields):
      score += fields[i] * weights[c][i]
    if score > best:
      best = score
      winner = c


# Zenon  http://stackoverflow.com/a/10134298/1256624

def f_index_comprehension():
    winner = -1
    best = -float('inf')
    for c in range(num_category):
      score = sum(fields[i] * weights[c][i] for i in xrange(num_fields))
      if score > best:
        best = score
        winner = c  


# steveha  http://stackoverflow.com/a/10134247/1256624

def f_comprehension():
  winner = -1
  best = -float('inf')

  for c in xrange(num_category):
    score = sum(f * w for f, w in itertools.izip(fields, weights[c]))
    if score > best:
      best = score
      winner = c

def f_schwartz_original(): # https://en.wikipedia.org/wiki/Schwartzian_transform
    tup = max(((i, sum(t[0] * t[1] for t in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)),
              key=lambda t: t[1]
             )

def f_schwartz_opt(): # https://en.wikipedia.org/wiki/Schwartzian_transform
    tup = max(((i, sum(f * w for f,w in itertools.izip(fields, wlist))) for i, wlist in enumerate(weights)),
              key=operator.itemgetter(1)
             )

def fweight(field_float_list, wlist):
    f = iter(field_float_list)
    return sum(f.next() * w for w in wlist)

def f_schwartz_iterate():
     tup = max(
         ((i, fweight(fields, wlist)) for i, wlist in enumerate(weights)),
         key=lambda t: t[1]
      )

# Nolen Royalty  http://stackoverflow.com/a/10134147/1256624 

def f_numpy_mult_sum():
   np.argmax(np.sum(npf * npw, axis = 1))


# me

def f_imap():
  winner = -1
  best = -float('inf')

  for c in xrange(num_category):
    score = sum(itertools.imap(operator.mul, fields, weights[c]))
    if score > best:
      best = score
      winner = c

def f_numpy():
   np.argmax(npw.dot(npf))



for f in [f_original,
          f_index_comprehension,
          f_schwartz_iterate,
          f_original_no_string,
          f_schwartz_original,
          f_original_xrange,
          f_schwartz_opt,
          f_comprehension,
          f_imap]:
   print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=10)/10 * 1000)
for f in [f_numpy_mult_sum, f_numpy]:
   print "%s: %.2f ms" % (f.__name__, timeit.timeit(f,number=100)/100 * 1000)

Running python test_find_best.py gives me:

f_original: 310.34 ms
f_index_comprehension: 102.58 ms
f_schwartz_iterate: 103.39 ms
f_original_no_string: 96.36 ms
f_schwartz_original: 90.52 ms
f_original_xrange: 89.31 ms
f_schwartz_opt: 69.48 ms
f_comprehension: 68.87 ms
f_imap: 53.33 ms
f_numpy_mult_sum: 3.57 ms
f_numpy: 0.62 ms

So the numpy version using .dot (sorry, I can't find the documentation for it atm) is the fastest. If you are doing a lot of numerical operations (which it seems you are), it might be worth converting fields and weights as numpy arrays as soon as you create them.

update_weights

Numpy is likely to offer a similar speed-up for update_weights, doing something like:

def update_weights(weights, fields, to_increase, to_decrease):
  weights[to_increase,:] += fields
  weights[to_decrease,:] -= fields
  return weights

(I haven't tested or profiled that btw, you need to do that.)

like image 130
huon Avatar answered Dec 06 '22 01:12

huon