Averaging Data in Bins

Question

I have two lists: 1 is a depth list and the other is a chlorophyll list, which correspond to each other. I want to average chlorophyll data every 0.5 m depth.

chl  = [0.4,0.1,0.04,0.05,0.4,0.2,0.6,0.09,0.23,0.43,0.65,0.22,0.12,0.2,0.33]
depth = [0.1,0.3,0.31,0.44,0.49,1.1,1.145,1.33,1.49,1.53,1.67,1.79,1.87,2.1,2.3]

The depth bins are not always equal in length and do not always start at 0.0 or 0.5 intervals. The chlorophyll data always coordinates with depth data though. The chlorophyll averages also cannot be arranged in ascending order, they need to stay in correct order according to depth. The depth and chlorophyll lists are very long, so I can't do this individually.

How would I make 0.5 m depth bins with averaged chlorophyll data in them?

Goal:

depth = [0.5,1.0,1.5,2.0,2.5]
chlorophyll = [avg1,avg2,avg3,avg4,avg5]

For example:

avg1 = np.mean(0.4,0.1,0.04,0.05,0.4)

miradulo · Accepted Answer

I'm surprised that scipy.stats.binned_statistic hasn't been mentioned yet. You can calculate the mean directly with it, and specify the bins with optional parameters.

from scipy.stats import binned_statistic

mean_stat = binned_statistic(depth, chl, 
                             statistic='mean', 
                             bins=5, 
                             range=(0, 2.5))

mean_stat.statistic
# array([0.198,   nan, 0.28 , 0.355, 0.265])
mean_stat.bin_edges
# array([0. , 0.5, 1. , 1.5, 2. , 2.5])
mean_stat.binnumber
# array([1, 1, 1, ..., 4, 5, 5])

Divakar · Answer

Here's a vectorized NumPy solution using np.searchsorted for getting the bin shifts (indices) and np.add.reduceat for the binned summations -

def bin_data(chl, depth, bin_start=0, bin_length= 0.5):
    # Get number of intervals and hence the bin-length-spaced depth array
    n = int(np.ceil(depth[-1]/bin_length))
    depthl = np.linspace(start=bin_start,stop=bin_length*n, num=n+1)

    # Indices along depth array where the intervaled array would have bin shifts
    idx = np.searchsorted(depth, depthl)

    # Number of elements in each bin (bin-lengths)
    lens = np.diff(idx)

    # Get summations for each bins & divide by bin lengths for binned avg o/p
    # For bins with lengths==0, set them as some invalid specifier, say NaN
    return np.where(lens==0, np.nan, np.add.reduceat(chl, idx[:-1])/lens)

Sample run -

In [83]: chl
Out[83]: 
array([0.4 , 0.1 , 0.04, 0.05, 0.4 , 0.2 , 0.6 , 0.09, 0.23, 0.43, 0.65,
       0.22, 0.12, 0.2 , 0.33])

In [84]: depth
Out[84]: 
array([0.1  , 0.3  , 0.31 , 0.44 , 0.49 , 1.1  , 1.145, 1.33 , 1.49 ,
       1.53 , 1.67 , 1.79 , 1.87 , 2.1  , 2.3  ])

In [85]: bin_data(chl, depth, bin_start=0, bin_length= 0.5)
Out[85]: array([0.198,   nan, 0.28 , 0.355, 0.265])

Averaging Data in Bins

Tags:

python

python-3.x

numpy

average

scientific-computing

Adam

2 Answers

miradulo

Divakar

Recent Activity

Donate For Us

Averaging Data in Bins

Tags:

python

python-3.x

numpy

average

scientific-computing

Adam

2 Answers

miradulo

Divakar

Related questions

Recent Activity

Donate For Us