I have two lists: 1 is a depth list and the other is a chlorophyll list, which correspond to each other. I want to average chlorophyll data every 0.5 m depth.
chl = [0.4,0.1,0.04,0.05,0.4,0.2,0.6,0.09,0.23,0.43,0.65,0.22,0.12,0.2,0.33]
depth = [0.1,0.3,0.31,0.44,0.49,1.1,1.145,1.33,1.49,1.53,1.67,1.79,1.87,2.1,2.3]
The depth bins are not always equal in length and do not always start at 0.0 or 0.5 intervals. The chlorophyll data always coordinates with depth data though. The chlorophyll averages also cannot be arranged in ascending order, they need to stay in correct order according to depth. The depth and chlorophyll lists are very long, so I can't do this individually.
How would I make 0.5 m depth bins with averaged chlorophyll data in them?
Goal:
depth = [0.5,1.0,1.5,2.0,2.5]
chlorophyll = [avg1,avg2,avg3,avg4,avg5]
For example:
avg1 = np.mean(0.4,0.1,0.04,0.05,0.4)
I'm surprised that scipy.stats.binned_statistic
hasn't been mentioned yet. You can calculate the mean directly with it, and specify the bins with optional parameters.
from scipy.stats import binned_statistic
mean_stat = binned_statistic(depth, chl,
statistic='mean',
bins=5,
range=(0, 2.5))
mean_stat.statistic
# array([0.198, nan, 0.28 , 0.355, 0.265])
mean_stat.bin_edges
# array([0. , 0.5, 1. , 1.5, 2. , 2.5])
mean_stat.binnumber
# array([1, 1, 1, ..., 4, 5, 5])
Here's a vectorized NumPy solution using np.searchsorted
for getting the bin shifts (indices) and np.add.reduceat
for the binned summations -
def bin_data(chl, depth, bin_start=0, bin_length= 0.5):
# Get number of intervals and hence the bin-length-spaced depth array
n = int(np.ceil(depth[-1]/bin_length))
depthl = np.linspace(start=bin_start,stop=bin_length*n, num=n+1)
# Indices along depth array where the intervaled array would have bin shifts
idx = np.searchsorted(depth, depthl)
# Number of elements in each bin (bin-lengths)
lens = np.diff(idx)
# Get summations for each bins & divide by bin lengths for binned avg o/p
# For bins with lengths==0, set them as some invalid specifier, say NaN
return np.where(lens==0, np.nan, np.add.reduceat(chl, idx[:-1])/lens)
Sample run -
In [83]: chl
Out[83]:
array([0.4 , 0.1 , 0.04, 0.05, 0.4 , 0.2 , 0.6 , 0.09, 0.23, 0.43, 0.65,
0.22, 0.12, 0.2 , 0.33])
In [84]: depth
Out[84]:
array([0.1 , 0.3 , 0.31 , 0.44 , 0.49 , 1.1 , 1.145, 1.33 , 1.49 ,
1.53 , 1.67 , 1.79 , 1.87 , 2.1 , 2.3 ])
In [85]: bin_data(chl, depth, bin_start=0, bin_length= 0.5)
Out[85]: array([0.198, nan, 0.28 , 0.355, 0.265])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With