Suppose I have a Python array a=[3, 5, 2, 7, 5, 3, 6, 8, 4]
. My goal is to iterate through this array 3 elements at a time returning the mean of the top 2 of the three elements.
Using the above array, during my iteration step, the first three elements are [3, 5, 2]
and the mean of the top 2 elements is 4. The next three elements are [5, 2, 7]
and the mean of the top 2 elements is 6. The next three elements are [2, 7, 5]
and the mean of the top 2 elements is again 6. ...
Hence, the result for the above array would be [4, 6, 6, 6, 5.5, 7, 7]
.
What is the nicest way to write such a function?
You can use some fancy slicing of your list to manipulate subsets of elements. Simply grab each three element sublist, sort to find the top two elements, and then find the simple average (aka. mean) and add it to a result list.
def get_means(input_list):
means = []
for i in xrange(len(input_list)-2):
three_elements = input_list[i:i+3]
sum_top_two = sum(three_elements) - min(three_elements)
means.append(sum_top_two/2.0)
return means
You can see your example input (and desired result) like so:
print(get_means([3, 5, 2, 7, 5, 3, 6, 8, 4]))
# [4.0, 6.0, 6.0, 6.0, 5.5, 7.0, 7.0]
There are some other great answers that get into more performance directed answers, including one using a generator to avoid large in memory lists: https://stackoverflow.com/a/49001728/416500
I believe in splitting the code in 2 parts. Here that would be getting the sliding window, getting the top 2 elements, and calculating the mean. cleanest way to do this is using generators
Slight variation on evamicur's answer using tee
, islice
and zip
to create the window:
def windowed_iterator(iterable, n=2):
iterators = itertools.tee(iterable, n)
iterators = (itertools.islice(it, i, None) for i, it in enumerate(iterators))
yield from zip(*iterators)
windows = windowed_iterator(iterable=a, n=3)
[(3, 5, 2), (5, 2, 7), (2, 7, 5), (7, 5, 3), (5, 3, 6), (3, 6, 8), (6, 8, 4)]
to calculate the mean of the 2 highest you can use any of the methods used in the other answers, I think the heapq
on is the clearest
from heapq import nlargest
top_n = map(lambda x: nlargest(2, x), windows)
or equivalently
top_n = (nlargest(2, i) for i in windows)
[[5, 3], [7, 5], [7, 5], [7, 5], [6, 5], [8, 6], [8, 6]]
from statistics import mean
means = map(mean, top_n)
[4, 6, 6, 6, 5.5, 7, 7]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With