Performing grouped average and standard deviation with NumPy arrays

Tags:

I have a set of data (X,Y). My independent variable values X are not unique, so there are multiple repeated values, I want to output a new array containing : X_unique, which is a list of unique values of X. Y_mean, the mean of all of the Y values corresponding to X_unique. Y_std, the standard deviation of all the Y values corresponding to X_unique.

x = data[:,0]
y = data[:,1]

432

asked Jan 05 '16 17:01

obtmind

2 Answers

You can use binned_statistic from scipy.stats that supports various statistic functions to be applied in chunks across a 1D array. To get the chunks, we need to sort and get positions of the shifts (where chunks change), for which np.unique would be useful. Putting all those, here's an implementation -

from scipy.stats import binned_statistic as bstat

# Sort data corresponding to argsort of first column
sdata = data[data[:,0].argsort()]

# Unique col-1 elements and positions of breaks (elements are not identical)
unq_x,breaks = np.unique(sdata[:,0],return_index=True)
breaks = np.append(breaks,data.shape[0])

# Use binned statistic to get grouped average and std deviation values
idx_range = np.arange(data.shape[0])
avg_y,_,_ = bstat(x=idx_range, values=sdata[:,1], statistic='mean', bins=breaks)
std_y,_,_ = bstat(x=idx_range, values=sdata[:,1], statistic='std', bins=breaks)

From the docs of binned_statistic, one can also use a custom statistic function :

function : a user-defined function which takes a 1D array of values, and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error.

Sample input, output -

In [121]: data
Out[121]: 
array([[2, 5],
       [2, 2],
       [1, 5],
       [3, 8],
       [0, 8],
       [6, 7],
       [8, 1],
       [2, 5],
       [6, 8],
       [1, 8]])

In [122]: np.column_stack((unq_x,avg_y,std_y))
Out[122]: 
array([[ 0.        ,  8.        ,  0.        ],
       [ 1.        ,  6.5       ,  1.5       ],
       [ 2.        ,  4.        ,  1.41421356],
       [ 3.        ,  8.        ,  0.        ],
       [ 6.        ,  7.5       ,  0.5       ],
       [ 8.        ,  1.        ,  0.        ]])

161

answered Nov 15 '22 00:11

Divakar

x_unique  = np.unique(x)
y_means = np.array([np.mean(y[x==u]) for u in x_unique])
y_stds = np.array([np.std(y[x==u]) for u in x_unique])

answered Nov 14 '22 23:11

Peter

Related questions
                            
                                Matplotlib Legends for barh
                            
                                Crop Image from all sides after edge detection
                            
                                Jupyter notebook kernel dies when creating dummy variables with pandas
                            
                                Rotating a list without using collection.deque
                            
                                How to write at a particular position in text file without erasing original contents?
                            
                                Mirroring rows in matrix with loops/recursion?
                            
                                numpy / scipy: Making one series converge towards another after a period of time
                            
                                Can't optimize multivariate linear regression in Tensorflow
                            
                                Remove parentheses around integers in a string
                            
                                Error : Could not find a version that satisfies the requirement webdriver (from versions: )
                            
                                What is a tensorflow float ref?
                            
                                Unhandled pending operations for models when trying to perform migration
                            
                                python - execute command and get output
                            
                                Python regex find and replace inplace
                            
                                Flask-admin inline modelling passing form arguments throws AttributeError
                            
                                Add additional feature to CountVectorizer matrix
                            
                                Trying to Plot OpenCV's MSER regions using matplotlib
                            
                                Higher order functions in Python
                            
                                Importing time module twice
                            
                                Pandas dataframe: how to group by values in a column and create new columns out of grouped values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performing grouped average and standard deviation with NumPy arrays

Tags:

python

arrays

numpy

obtmind

People also ask

2 Answers

Divakar

Peter

Recent Activity

Donate For Us