Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

improve performance of list creation

How can I improve significantly the speed of the following code? Can mapping, numpy, matrix operations be efficiently used and/or something else to omit the for loop?

import time

def func(x):
    if x%2 == 0:
        return 'even'
    else:
        return 'odd'

starttime = time.time()

MAX=1000000

y=list(range(MAX))

for n in range(MAX):
    y[n]=[n,n**2,func(n)]

print('That took {} seconds'.format(time.time() - starttime))

The following replacement does not improve the speed:

import numpy as np
r = np.array(range(MAX))
str = ['odd', 'even']
result = np.array([r, r ** 2, list(map(lambda x: str[x % 2], r))])
y = result.T
like image 389
len Avatar asked Dec 11 '22 00:12

len


2 Answers

I think you can do it this way, the idea is to use as many numpy built-in functions as possible

%%timeit
y = np.arange(MAX)
y_2 = y**2
y_str = np.where(y%2==0,'even','odd')

res = np.rec.fromarrays((y,y_2,y_str), names=('y', 'y_2', 'y_str'))

#
# Some examples for working with the record array
res[3]
# (3, 9, 'odd')
res[:3]
# rec.array([(0, 0, 'even'), (1, 1, 'odd'), (2, 4, 'even')],
#           dtype=[('y', '<i8'), ('y_2', '<i8'), ('y_str', '<U4')])
res['y_str'][:7]
# array(['even', 'odd', 'even', 'odd', 'even', 'odd', 'even'], dtype='<U4')
res.y_2[:7]
# array([ 0,  1,  4,  9, 16, 25, 36])

I have ran several tests, and it is significantly faster.

like image 162
meTchaikovsky Avatar answered Dec 24 '22 22:12

meTchaikovsky


For large arrays of the same type, numpy is the way to go. But numpy where is slow, so if you just want 'odd' and 'even', you can use np.tile or something like it:

MAX = 1000000

%%timeit
y = np.arange(MAX)
ystr = np.where(y%2==0,'even','odd')
#  14.9 ms ± 61.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
temp = np.array(['even', 'odd'])
ystr = np.tile(temp, MAX//2)
# 4.1 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So tile is about 3-4x faster.

If you want something more complex, I'd still try to avoid where if speed is important. There's almost always a way because the where logic is usually simple so it's easy to take the logical expression that was inside the where and write it as an expression between numpy arrays. (Also, to be sure, using numpy and where will be much faster than pure Python lists, it's just usually slow relative to other numpy options.)

The others are fairly obvious:

y = np.arange(MAX) 
y2 = y**2

Personally, I'd just stick these together in a list,

result = [y, y2, ystr]

Putting this all together (using tile), I get:

# 6.82 ms ± 84.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
like image 31
tom10 Avatar answered Dec 24 '22 21:12

tom10