improve performance of list creation

Question

How can I improve significantly the speed of the following code? Can mapping, numpy, matrix operations be efficiently used and/or something else to omit the for loop?

import time

def func(x):
    if x%2 == 0:
        return 'even'
    else:
        return 'odd'

starttime = time.time()

MAX=1000000

y=list(range(MAX))

for n in range(MAX):
    y[n]=[n,n**2,func(n)]

print('That took {} seconds'.format(time.time() - starttime))

The following replacement does not improve the speed:

import numpy as np
r = np.array(range(MAX))
str = ['odd', 'even']
result = np.array([r, r ** 2, list(map(lambda x: str[x % 2], r))])
y = result.T

meTchaikovsky · Accepted Answer

I think you can do it this way, the idea is to use as many numpy built-in functions as possible

%%timeit
y = np.arange(MAX)
y_2 = y**2
y_str = np.where(y%2==0,'even','odd')

res = np.rec.fromarrays((y,y_2,y_str), names=('y', 'y_2', 'y_str'))

#
# Some examples for working with the record array
res[3]
# (3, 9, 'odd')
res[:3]
# rec.array([(0, 0, 'even'), (1, 1, 'odd'), (2, 4, 'even')],
#           dtype=[('y', '<i8'), ('y_2', '<i8'), ('y_str', '<U4')])
res['y_str'][:7]
# array(['even', 'odd', 'even', 'odd', 'even', 'odd', 'even'], dtype='<U4')
res.y_2[:7]
# array([ 0,  1,  4,  9, 16, 25, 36])

I have ran several tests, and it is significantly faster.

tom10 · Answer

For large arrays of the same type, numpy is the way to go. But numpy where is slow, so if you just want 'odd' and 'even', you can use np.tile or something like it:

MAX = 1000000

%%timeit
y = np.arange(MAX)
ystr = np.where(y%2==0,'even','odd')
#  14.9 ms ± 61.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
temp = np.array(['even', 'odd'])
ystr = np.tile(temp, MAX//2)
# 4.1 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So tile is about 3-4x faster.

If you want something more complex, I'd still try to avoid where if speed is important. There's almost always a way because the where logic is usually simple so it's easy to take the logical expression that was inside the where and write it as an expression between numpy arrays. (Also, to be sure, using numpy and where will be much faster than pure Python lists, it's just usually slow relative to other numpy options.)

The others are fairly obvious:

y = np.arange(MAX) 
y2 = y**2

Personally, I'd just stick these together in a list,

result = [y, y2, ystr]

Putting this all together (using tile), I get:

# 6.82 ms ± 84.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

improve performance of list creation

Tags:

performance

python

list

numpy

len

2 Answers

meTchaikovsky

tom10

Recent Activity

Donate For Us

improve performance of list creation

Tags:

performance

python

list

numpy

len

2 Answers

meTchaikovsky

tom10

Related questions

Recent Activity

Donate For Us