I have a bit of code that runs many thousands of times in my project:
def resample(freq, data):
output = []
for i, elem in enumerate(freq):
for _ in range(elem):
output.append(data[i])
return output
eg. resample([1,2,3], ['a', 'b', 'c'])
=> ['a', 'b', 'b', 'c', 'c', 'c']
I want to speed this up as much as possible. It seems like a list comprehension could be faster. I have tried:
def resample(freq, data):
return [item for sublist in [[data[i]]*elem for i, elem in enumerate(frequencies)] for item in sublist]
Which is hideous and also slow because it builds the list and then flattens it. Is there a way to do this with one line list comprehension that is fast? Or maybe something with numpy?
Thanks in advance!
edit: Answer does not necessarily need to eliminate the nested loops, fastest code is the best
I highly suggest using generators like so:
from itertools import repeat, chain
def resample(freq, data):
return chain.from_iterable(map(repeat, data, freq))
This will probably be the fastest method there is - map()
, repeat()
and chain.from_iterable()
are all implemented in C so you technically can't get any better.
As for a small explanation:
repeat(i, n)
returns an iterator that repeats an item i
, n
times.
map(repeat, data, freq)
returns an iterator that calls repeat every time on an element of data
and an element of freq
. Basically an iterator that returns repeat()
iterators.
chain.from_iterable()
flattens the iterator of iterators to return the end items.
No list is created on the way, so there is no overhead and as an added benefit - you can use any type of data and not just one char strings.
While I don't suggest it, you are able to convert it into a list()
like so:
result = list(resample([1,2,3], ['a','b','c']))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With