Python- np.mean() giving wrong means?

Tags:

The issue

So I have 50 netCDF4 data files that contain decades of monthly temperature predictions on a global grid. I'm using np.mean() to make an ensemble average of all 50 data files together while preserving time length & spatial scale, but np.mean() gives me two different answers. The first time I run its block of code, it gives me a number that, when averaged over latitude & longitude & plotted against the individual runs, is slightly lower than what the ensemble mean should be. If I re-run the block, it gives me a different mean which looks correct.

The code

I can't copy every line here since it's long, but here's what I do for each run.

#Historical (1950-2020) data
ncin_1 = Dataset("/project/wca/AR5/CanESM2/monthly/histr1/tas_Amon_CanESM2_historical-r1_r1i1p1_195001-202012.nc") #Import data file
tash1 = ncin_1.variables['tas'][:] #extract tas (temperature) variable
ncin_1.close() #close to save memory

#Repeat for future (2021-2100) data
ncin_1 = Dataset("/project/wca/AR5/CanESM2/monthly/histr1/tas_Amon_CanESM2_historical-r1_r1i1p1_202101-210012.nc")
tasr1 = ncin_1.variables['tas'][:]
ncin_1.close()

#Concatenate historical & future files together to make one time series array
tas11 = np.concatenate((tash1,tasr1),axis=0)

#Subtract the 1950-1979 mean to obtain anomalies
tas11 = tas11 - np.mean(tas11[0:359],axis=0,dtype=np.float64)

And I repeat that 49 times more for other datasets. Each tas11, tas12, etc file has the shape (1812, 64, 128) corresponding to time length in months, latitude, and longitude.

To get the ensemble mean, I do the following.

#Move all tas data to one array
alltas = np.zeros((1812,64,128,51)) #years, lat, lon, members (no ensemble mean value yet)
alltas[:,:,:,0] = tas11
(...)
alltas[:,:,:,49] = tas50

#Calculate ensemble mean & fill into 51st slot in axis 3
alltas[:,:,:,50] = np.mean(alltas,axis=3,dtype=np.float64)

When I check a coordinate & month, the ensemble mean is off from what it should be. Here's what a plot of globally averaged temperatures from 1950-2100 looks like with the first mean (with monhly values averaged into annual values. Black line is ensemble mean & colored lines are individual runs.

enter image description here

Obviously that deviated below the real ensemble mean. Here's what the plot looks like when I run alltas[:,:,:,50]=np.mean(alltas,axis=3,dtype=np.float64) a second time & keep everything else the same.

enter image description here

Much better.

The question

Why does np.mean() calculate the wrong value the first time? I tried specifying the data type as a float when using np.mean() like in this question- Wrong numpy mean value? But it didn't work. Any way I can fix it so it works correctly the first time? I don't want this problem to occur on a calculation where it's not so easy to notice a math error.

967

asked Jul 25 '16 20:07

ChristineB

1 Answers

In the line

alltas[:,:,:,50] = np.mean(alltas,axis=3,dtype=np.float64)

the argument to mean should be alltas[:,:,:,:50]:

alltas[:,:,:,50] = np.mean(alltas[:,:,:,:50], axis=3, dtype=np.float64)

Otherwise you are including those final zeros in the calculation of the ensemble means.

answered Sep 24 '22 04:09

Warren Weckesser

Related questions
                            
                                Differences of transpose functions in python
                            
                                Connect line of scatter plot on pandas DataFrame
                            
                                import tensor with mat format to tensorflow
                            
                                How to make kafka-python or pykafka work as an async producer with uwsgi and gevent?
                            
                                Why are there so many improper underlines when I debugging python code in Visual Studio Code
                            
                                Plotting discrete colorbar in legend style using Matplotlib
                            
                                Aiohttp, Asyncio: RuntimeError: Event loop is closed
                            
                                How can I copy style of an existing Line2D object to a plot() call? (matplotlib)
                            
                                H5py store list of list of strings
                            
                                pandas DataFrame add fill_value NotImplementedError
                            
                                Why does Python's eval(input("Enter input: ")) change input's datatype?
                            
                                django - AttributeError: 'NoneType' object has no attribute 'first_name'
                            
                                How to transform vector into unit vector in Tensorflow
                            
                                Finding conditional probability of trigram in python nltk
                            
                                split sentence without space in python (nltk?)
                            
                                Inverse filtering using Python
                            
                                Python OpenCV plot circles at a list of centre coordinates
                            
                                Python regular expression to replace everything but specific words
                            
                                How to unittest Flask websocket server (Flask-SocketIO)
                            
                                How to make a Luigi task generate an in-memory list as target

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python- np.mean() giving wrong means?

Tags:

python

numpy

mean

ChristineB

People also ask

1 Answers

Warren Weckesser

Recent Activity

Donate For Us