Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way of calculating mean values for each indices

I have two 2d arrays of equal shapes: given_array and reference_array. I have to write a file for each unique value of the reference_array computing mean values where the unique value are in the given array.

import numpy as np

given_array = np.array([[2,4,5,8,9,11,15],[1,2,3,4,5,6,7]])

reference_array = np.array([[2,2,2,8,8,8,15],[2,2,2,4,8,8,9]])

unique_value = np.unique(reference_array)

file_out = open('file_out', 'w')

for unique in unique_value:
    index = reference_array == unique
    mean = np.mean(given_array[index])
    file_out.write(str(unique) + ',' + str(mean) + '\n')

file_out.close()

The above code works, but in my real problem two arrays are extremely large as read from raster image, and it is taking few days to complete the processing.

Would be grateful if someone could provide fastest way of producing the same result.

like image 282
Borys Avatar asked Dec 25 '22 01:12

Borys


1 Answers

Going only once through the arrays might be faster, even it uses pure python:

from collections import defaultdict
from itertools import izip

add = lambda (sum_, count), value: (sum_+value, count+1)
unique = defaultdict(lambda:(0,0))
for ref, value in izip(reference_array.flat, given_array.flat):
    unique[ref] = add(unique[ref], float(value))

with open('file.out', 'w') as out:
    for ref, (sum_, count) in unique.iteritems():
        out.write('%f,%f\n' % (ref, sum_ / count))

In contrast to the solution of the OP, finding the unique values and calculating the mean values is done in one loop. unique is a dictionary where the key is one reference value and the value is a pair of the sum and count of all the given values which have the same reference value. After the loop, not only all unique reference values are put into the dictionary unique but also all given elements are sorted to their reference value as sum and count, which can easily used to calculate the mean value in a second step.

The complexity of the problem was reduced from size_of_array * number_of_unique_values to size_of_array + number_of_unique_values.

like image 110
Daniel Avatar answered Dec 27 '22 14:12

Daniel