I have an N-by-2 array of N 2D points, which I want to assign to an M-by-K grid of bins.
For instance, points [m + 0.1, k]
and [m + 0.1, k + 0.9]
should fall into bin [m, k]
, where both m
and k
are integers. It's possible that a point doesn't fall into any of the bins.
Implementation-wise, I want the result stored in a logical M-by-K-by-N array in_bin
, where in_bin[m, k, n]
is True
, if point n
falls into bin [m, k]
.
Here is how I'm doing this, naively with double loops.
M = 10
K = 11
N = 100
pts = 20 * np.random.rand(N, 2)
in_bin = np.zeros((M, K, N), dtype=bool)
for m in range(M):
for k in range(K):
inbin_h = (pts[:, 0] >= m) & (pts[:, 0] < (m + 1))
inbin_w = (pts[:, 1] >= k) & (pts[:, 1] < (k + 1))
in_bin[m, k, np.where(inbin_h & inbin_w)[0]] = True
The where
isn't actually needed (not that this changes speed much):
In [120]: in_bin1 = np.zeros((M, K, N), dtype=bool)
...: for m in range(M):
...: for k in range(K):
...: inbin_h = (pts[:, 0] >= m) & (pts[:, 0] < (m + 1))
...: inbin_w = (pts[:, 1] >= k) & (pts[:, 1] < (k + 1))
...: in_bin1[m, k, inbin_h & inbin_w] = True
But we can do the allocation for all m
and k
at once:
In [125]: x0=(pts[:,0]>=np.arange(M)[:,None]) & (pts[:,0]<np.arange(1,M+1)[:,None]);
In [126]: x1=(pts[:,1]>=np.arange(K)[:,None]) & (pts[:,1]<np.arange(1,K+1)[:,None]);
In [127]: x0.shape
Out[127]: (10, 100)
In [128]: x1.shape
Out[128]: (11, 100)
combine these with broadcasting:
In [129]: xx = x0[:,None,:] & x1[None,:,:]
In [130]: xx.shape
Out[130]: (10, 11, 100)
In [131]: np.allclose(in_bin1, xx) # and check
Out[131]: True
You can do this using histogram2d
as follows:
hist = np.dstack(np.histogram2d([pts[i,0]],[pts[i,1]],bins=[np.arange(M+1),np.arange(K+1)])[0] for i in range(len(pts)))
which only involves a single for loop over the number of points. If N is much smaller than M*K this should be faster.
Here is another method without any for loops using searchsorted
which is what histogram2d
uses:
def bin_points(pts, m, k):
binsx = np.arange(m+1)
binsy = np.arange(k+1)
index_x = np.searchsorted(binsx,pts[:,0]) - 1
index_y = np.searchsorted(binsy,pts[:,1]) - 1
# mask out values which fall outside the bins
mask = (index_x >= 0) & (index_x < m) & (index_y >= 0) & (index_y < k)
index_x = index_x[mask]
index_y = index_y[mask]
n = np.arange(pts.shape[0])[mask]
in_bin = np.zeros((M, K, pts.shape[0]), dtype=bool)
in_bin[index_x,index_y,n] = 1
Here are some benchmarks:
M = 10, K = 11, N = 100
In [2]: %timeit bin_points(pts,M,K)
10000 loops, best of 3: 34.1 µs per loop
In [3]: %timeit bin_points_double_for_loop(pts,M,K)
1000 loops, best of 3: 1.71 ms per loop
In [4]: %timeit bin_points_broadcast(pts,M,K)
10000 loops, best of 3: 39.6 µs per loop
M = 100, K = 110, N = 1000
In [2]: %timeit bin_points(pts,M,K)
1000 loops, best of 3: 721 µs per loop
In [3]: %timeit bin_points_double_for_loop(pts,M,K)
1 loop, best of 3: 249 ms per loop
In [4]: %timeit bin_points_broadcast(pts,M,K)
100 loops, best of 3: 3.04 ms per loop
We are checking if those floating pt numbers in pts
are in each integer bin at each iteration. So, the trick we could use would be to convert those floating pt numbers to their floored-down numbers. Additionally, we need to mask out the valid ones that satisfy the range(M)
and range(N)
. That's all there is!
Here's the implementation -
def binpts(pts, M, K):
N = len(pts)
in_bin_out = np.zeros((M, K, N), dtype=bool)
mask = (pts[:,0]<M) & (pts[:,1]<K)
pts_f = pts[mask]
r,c = pts_f.astype(int).T
in_bin_out[r, c, mask] = 1
return in_bin_out
Timings on large arrays with the range in the floating pts array proportional to its size as provided in given sample -
Case #1 :
In [2]: M = 100
...: K = 101
...: N = 10000
...: np.random.seed(0)
...: pts = 2000 * np.random.rand(N, 2)
# @hpaulj's soln
In [3]: %%timeit
...: x0=(pts[:,0]>=np.arange(M)[:,None]) & (pts[:,0]<np.arange(1,M+1)[:,None])
...: x1=(pts[:,1]>=np.arange(K)[:,None]) & (pts[:,1]<np.arange(1,K+1)[:,None])
...: xx = x0[:,None,:] & x1[None,:,:]
10 loops, best of 3: 47.5 ms per loop
# @user545424's soln
In [6]: %timeit bin_points(pts,M,K)
1000 loops, best of 3: 331 µs per loop
In [7]: %timeit binpts(pts,M,K)
10000 loops, best of 3: 125 µs per loop
Note : @hpaulj's soln is memory intensive and I ran out of memory using it on larger ones.
Case #2 :
In [8]: M = 100
...: K = 101
...: N = 100000
...: np.random.seed(0)
...: pts = 20000 * np.random.rand(N, 2)
In [9]: %timeit bin_points(pts,M,K)
...: %timeit binpts(pts,M,K)
100 loops, best of 3: 2.31 ms per loop
1000 loops, best of 3: 585 µs per loop
Case #3 :
In [10]: M = 100
...: K = 101
...: N = 1000000
...: np.random.seed(0)
...: pts = 200000 * np.random.rand(N, 2)
In [11]: %timeit bin_points(pts,M,K)
...: %timeit binpts(pts,M,K)
10 loops, best of 3: 34.6 ms per loop
100 loops, best of 3: 2.78 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With