What is the most efficient way to sequentially find the mean and median of rows in a Python list?
For example, my list:
input_list = [1,2,4,6,7,8]
I want to produce an output list that contains:
output_list_mean = [1,1.5,2.3,3.25,4,4.7]
output_list_median = [1,1.5,2.0,3.0,4.0,5.0]
Where the mean is calculated as follows:
And the median is calculated as follows:
I have tried to implement it with the following loop, but it seems very inefficient.
import numpy
input_list = [1,2,4,6,7,8]
for item in range(1,len(input_list)+1):
print(numpy.mean(input_list[:item]))
print(numpy.median(input_list[:item]))
Anything you do yourself, especially with the median, is either going to require a lot of work, or be very inefficient, but Pandas comes with built-in efficient implementations of the functions you are after, the expanding mean is O(n), the expanding median is O(n*log(n)) using a skip list:
import pandas as pd
import numpy as np
input_list = [1, 2, 4, 6, 7, 8]
>>> pd.expanding_mean(np.array(input_list))
array([ 1. , 1.5 , 2.33333, 3.25 , 4. , 4.66667])
>>> pd.expanding_median(np.array(input_list))
array([ 1. , 1.5, 2. , 3. , 4. , 5. ])
You can use itertools.islice
to slice your array and use np.fromiter
with np.mean
:
>>> arr=np.array([1,2,4,6,7,8])
>>> l=arr.size
>>> from itertools import islice
>>> [np.fromiter(islice(arr,0,i+1),float).mean(dtype=np.float32) for i in xrange(l)]
[1.0, 1.5, 2.3333333, 3.25, 4.0, 4.6666665]
As an alternative answer you if you want the average you can use np.cumsum
to get a cumulative sum of the your elements and divide with the main array using np.true_divide
:
>>> np.true_divide(np.cumsum(arr),arr)
array([ 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With