Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python- np.mean() giving wrong means?

Tags:

python

numpy

mean

The issue

So I have 50 netCDF4 data files that contain decades of monthly temperature predictions on a global grid. I'm using np.mean() to make an ensemble average of all 50 data files together while preserving time length & spatial scale, but np.mean() gives me two different answers. The first time I run its block of code, it gives me a number that, when averaged over latitude & longitude & plotted against the individual runs, is slightly lower than what the ensemble mean should be. If I re-run the block, it gives me a different mean which looks correct.

The code

I can't copy every line here since it's long, but here's what I do for each run.

#Historical (1950-2020) data
ncin_1 = Dataset("/project/wca/AR5/CanESM2/monthly/histr1/tas_Amon_CanESM2_historical-r1_r1i1p1_195001-202012.nc") #Import data file
tash1 = ncin_1.variables['tas'][:] #extract tas (temperature) variable
ncin_1.close() #close to save memory

#Repeat for future (2021-2100) data
ncin_1 = Dataset("/project/wca/AR5/CanESM2/monthly/histr1/tas_Amon_CanESM2_historical-r1_r1i1p1_202101-210012.nc")
tasr1 = ncin_1.variables['tas'][:]
ncin_1.close()

#Concatenate historical & future files together to make one time series array
tas11 = np.concatenate((tash1,tasr1),axis=0)

#Subtract the 1950-1979 mean to obtain anomalies
tas11 = tas11 - np.mean(tas11[0:359],axis=0,dtype=np.float64)

And I repeat that 49 times more for other datasets. Each tas11, tas12, etc file has the shape (1812, 64, 128) corresponding to time length in months, latitude, and longitude.

To get the ensemble mean, I do the following.

#Move all tas data to one array
alltas = np.zeros((1812,64,128,51)) #years, lat, lon, members (no ensemble mean value yet)
alltas[:,:,:,0] = tas11
(...)
alltas[:,:,:,49] = tas50

#Calculate ensemble mean & fill into 51st slot in axis 3
alltas[:,:,:,50] = np.mean(alltas,axis=3,dtype=np.float64)

When I check a coordinate & month, the ensemble mean is off from what it should be. Here's what a plot of globally averaged temperatures from 1950-2100 looks like with the first mean (with monhly values averaged into annual values. Black line is ensemble mean & colored lines are individual runs.

enter image description here

Obviously that deviated below the real ensemble mean. Here's what the plot looks like when I run alltas[:,:,:,50]=np.mean(alltas,axis=3,dtype=np.float64) a second time & keep everything else the same.

enter image description here

Much better.

The question

Why does np.mean() calculate the wrong value the first time? I tried specifying the data type as a float when using np.mean() like in this question- Wrong numpy mean value? But it didn't work. Any way I can fix it so it works correctly the first time? I don't want this problem to occur on a calculation where it's not so easy to notice a math error.

like image 967
ChristineB Avatar asked Jul 25 '16 20:07

ChristineB


People also ask

How is NP mean () different from NP average () in NumPy?

mean always calculates the arithmetic mean. np. average has an optional weights parameter that can be used to calculate a weighted average.

How do you find the mean of a NumPy in Python?

You get the mean by calculating the sum of all values in a Numpy array divided by the total number of values. By default, the average is taken from the flattened array (from all array elements), otherwise along with the specified axis.

Is there a mean function in NumPy?

Compute the arithmetic mean along the specified axis. Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis.

What does 3 mean in NumPy?

The 3 is part of that shape tuple. You will reference the dimensions by number in subsequent numpy code. arr. shape[2] will return 3 , and arr[:,:,0] will be all the R values of the image (if that is the correct interpreation).


1 Answers

In the line

alltas[:,:,:,50] = np.mean(alltas,axis=3,dtype=np.float64)

the argument to mean should be alltas[:,:,:,:50]:

alltas[:,:,:,50] = np.mean(alltas[:,:,:,:50], axis=3, dtype=np.float64)

Otherwise you are including those final zeros in the calculation of the ensemble means.

like image 80
Warren Weckesser Avatar answered Sep 24 '22 04:09

Warren Weckesser