Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting the number of times a threshold is met or exceeded in a multidimensional array in Python

I have an numpy array that I brought in from a netCDF file with the shape (930, 360, 720) where it is organized as (time, latitudes, longitudes).

At each lat/lon pair for each of the 930 time stamps, I need to count the number of times that the value meets or exceeds a threshold "x" (such as 0.2 or 0.5 etc.) and ultimately calculate the percentage that the threshold was exceeded at each point, then output the results so they can be plotted later on.

I have attempted numerous methods but here is my most recent:

lat_length = len(lats) 

#where lats has been defined earlier when unpacked from the netCDF dataset

lon_length = len(lons) 

#just as lats; also these were defined before using np.meshgrid(lons, lats)

for i in range(0, lat_length):
     for j in range(0, lon_length):
          if ice[:,i,j] >= x:
               #code to count number of occurrences here
               #code to calculate percentage here
               percent_ice[i,j] += count / len(time) #calculation 

 #then go on to plot percent_ice

I hope this makes sense! I would greatly appreciate any help. I'm self taught in Python so I may be missing something simple.

Would this be a time to use the any() function? What would be the most efficient way to count the number of times the threshold was exceeded and then calculate the percentage?

like image 764
wxgirl1031 Avatar asked Mar 13 '23 21:03

wxgirl1031


2 Answers

You can compare the input 3D array with the threshold x and then sum along the first axis with ndarray.sum(axis=0) to get the count and thereby the percentages, like so -

# Calculate count after thresholding with x and summing along first axis
count = (ice > x).sum(axis=0)

# Get percentages (ratios) by dividing with first axis length
percent_ice = np.true_divide(count,ice.shape[0])
like image 129
Divakar Avatar answered May 03 '23 19:05

Divakar


Ah, look, another meteorologist!

There are probably multiple ways to do this and my solution is unlikely to be the fastest since it uses numpy's MaskedArray, which is known to be slow, but this should work:

Numpy has a data type called a MaskedArray which actually contains two normal numpy arrays. It contains a data array as well as a boolean mask. I would first mask all data that are greater than or equal to my threshold (use np.ma.masked_greater() for just greater than):

ice = np.ma.masked_greater_equal(ice)

You can then use ice.count() to determine how many values are below your threshold for each lat/lon point by specifying that you want to count along a specific axis:

n_good = ice.count(axis=0)

This should return a 2-dimensional array containing the number of good points. You can then calculate the number of bad by subtracting n_good from ice.shape[0]:

n_bad = ice.shape[0] - n_good

and calculate the percentage that are bad using:

perc_bad = n_bad/float(ice.shape[0])

There are plenty of ways to do this without using MaskedArray. This is just the easy way that comes to mind for me.

like image 28
Vorticity Avatar answered May 03 '23 17:05

Vorticity