Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python iterate through array while finding the mean of the top k elements

Suppose I have a Python array a=[3, 5, 2, 7, 5, 3, 6, 8, 4]. My goal is to iterate through this array 3 elements at a time returning the mean of the top 2 of the three elements.

Using the above array, during my iteration step, the first three elements are [3, 5, 2] and the mean of the top 2 elements is 4. The next three elements are [5, 2, 7] and the mean of the top 2 elements is 6. The next three elements are [2, 7, 5] and the mean of the top 2 elements is again 6. ...

Hence, the result for the above array would be [4, 6, 6, 6, 5.5, 7, 7].

What is the nicest way to write such a function?

like image 628
Student Avatar asked Feb 27 '18 03:02

Student


2 Answers

Solution

You can use some fancy slicing of your list to manipulate subsets of elements. Simply grab each three element sublist, sort to find the top two elements, and then find the simple average (aka. mean) and add it to a result list.

Code

def get_means(input_list):
    means = []
    for i in xrange(len(input_list)-2):
        three_elements = input_list[i:i+3]
        sum_top_two = sum(three_elements) - min(three_elements)
        means.append(sum_top_two/2.0)
    return means

Example

You can see your example input (and desired result) like so:

print(get_means([3, 5, 2, 7, 5, 3, 6, 8, 4]))
# [4.0, 6.0, 6.0, 6.0, 5.5, 7.0, 7.0]

And more...

There are some other great answers that get into more performance directed answers, including one using a generator to avoid large in memory lists: https://stackoverflow.com/a/49001728/416500

like image 162
foslock Avatar answered Sep 18 '22 16:09

foslock


I believe in splitting the code in 2 parts. Here that would be getting the sliding window, getting the top 2 elements, and calculating the mean. cleanest way to do this is using generators

Sliding window

Slight variation on evamicur's answer using tee, islice and zip to create the window:

def windowed_iterator(iterable, n=2):
    iterators = itertools.tee(iterable, n)
    iterators = (itertools.islice(it, i, None) for i, it in enumerate(iterators))
    yield from zip(*iterators)

windows = windowed_iterator(iterable=a, n=3)
[(3, 5, 2), (5, 2, 7), (2, 7, 5), (7, 5, 3), (5, 3, 6), (3, 6, 8), (6, 8, 4)]

top 2 elements

to calculate the mean of the 2 highest you can use any of the methods used in the other answers, I think the heapq on is the clearest

from heapq import nlargest
top_n = map(lambda x: nlargest(2, x), windows)

or equivalently

top_n = (nlargest(2, i) for i in windows)
[[5, 3], [7, 5], [7, 5], [7, 5], [6, 5], [8, 6], [8, 6]]

mean

from statistics import mean
means = map(mean, top_n)
[4, 6, 6, 6, 5.5, 7, 7]
like image 41
Maarten Fabré Avatar answered Sep 19 '22 16:09

Maarten Fabré