Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Averaging Data in Bins

I have two lists: 1 is a depth list and the other is a chlorophyll list, which correspond to each other. I want to average chlorophyll data every 0.5 m depth.

chl  = [0.4,0.1,0.04,0.05,0.4,0.2,0.6,0.09,0.23,0.43,0.65,0.22,0.12,0.2,0.33]
depth = [0.1,0.3,0.31,0.44,0.49,1.1,1.145,1.33,1.49,1.53,1.67,1.79,1.87,2.1,2.3]

The depth bins are not always equal in length and do not always start at 0.0 or 0.5 intervals. The chlorophyll data always coordinates with depth data though. The chlorophyll averages also cannot be arranged in ascending order, they need to stay in correct order according to depth. The depth and chlorophyll lists are very long, so I can't do this individually.

How would I make 0.5 m depth bins with averaged chlorophyll data in them?

Goal:

depth = [0.5,1.0,1.5,2.0,2.5]
chlorophyll = [avg1,avg2,avg3,avg4,avg5]

For example:

avg1 = np.mean(0.4,0.1,0.04,0.05,0.4)
like image 592
Adam Avatar asked Apr 15 '18 16:04

Adam


2 Answers

I'm surprised that scipy.stats.binned_statistic hasn't been mentioned yet. You can calculate the mean directly with it, and specify the bins with optional parameters.

from scipy.stats import binned_statistic

mean_stat = binned_statistic(depth, chl, 
                             statistic='mean', 
                             bins=5, 
                             range=(0, 2.5))

mean_stat.statistic
# array([0.198,   nan, 0.28 , 0.355, 0.265])
mean_stat.bin_edges
# array([0. , 0.5, 1. , 1.5, 2. , 2.5])
mean_stat.binnumber
# array([1, 1, 1, ..., 4, 5, 5])
like image 59
miradulo Avatar answered Sep 18 '22 23:09

miradulo


Here's a vectorized NumPy solution using np.searchsorted for getting the bin shifts (indices) and np.add.reduceat for the binned summations -

def bin_data(chl, depth, bin_start=0, bin_length= 0.5):
    # Get number of intervals and hence the bin-length-spaced depth array
    n = int(np.ceil(depth[-1]/bin_length))
    depthl = np.linspace(start=bin_start,stop=bin_length*n, num=n+1)

    # Indices along depth array where the intervaled array would have bin shifts
    idx = np.searchsorted(depth, depthl)

    # Number of elements in each bin (bin-lengths)
    lens = np.diff(idx)

    # Get summations for each bins & divide by bin lengths for binned avg o/p
    # For bins with lengths==0, set them as some invalid specifier, say NaN
    return np.where(lens==0, np.nan, np.add.reduceat(chl, idx[:-1])/lens)

Sample run -

In [83]: chl
Out[83]: 
array([0.4 , 0.1 , 0.04, 0.05, 0.4 , 0.2 , 0.6 , 0.09, 0.23, 0.43, 0.65,
       0.22, 0.12, 0.2 , 0.33])

In [84]: depth
Out[84]: 
array([0.1  , 0.3  , 0.31 , 0.44 , 0.49 , 1.1  , 1.145, 1.33 , 1.49 ,
       1.53 , 1.67 , 1.79 , 1.87 , 2.1  , 2.3  ])

In [85]: bin_data(chl, depth, bin_start=0, bin_length= 0.5)
Out[85]: array([0.198,   nan, 0.28 , 0.355, 0.265])
like image 30
Divakar Avatar answered Sep 19 '22 23:09

Divakar