I have a large image which contains "blobs" of interest over a background. I have the position (centroid, bounding box, area) of all blobs. I want to crop a limited number of regions of a fixed size in the image that allow me to capture most of the blobs. Example below for 1, 2 or 3 cropped regions in the same image.
This example shows that cropping 1 region (in red) is relatively easy: just pick the region with as many blobs as possible. This can be solved by just trying everything or possibly by computing a density of blobs using a kernel density estimator or something like this.
But cropping 2 regions (in dashed blue) is not just cropping the next best crop after the first selected above. It is a new problem where I need to find the optimal combination of 2 crops. Trying all combinations of 2 crops (brute force) probably becomes too computationally expensive (I have many images to process and they are large).
Similarly, cropping 3 regions (in green) is a new problem, and one for which brute force is even less suited. In that particular example 2 of the 3 regions are identical to the blue case and a new one is added but this is not the general case (I just wanted to show a slightly complex scenario).
I have no idea regarding the algorithm to solve the n-crops case. I am wondering if there is a theoretical/well known solution to this problem.
In addition:
Thanks in advance for any pointer.
PS: The coding language does not really matter here since the core of the algorithm (finding the optimal position of the crops given the location/size of blobs) only needs small arrays (position/size of ~100 blobs per image) to be computed. I'll probably use Python or R.
If the blobs are relatively small as shown in the image, you could run k-Means Clustering with the blob center x,y pairs. The python scikit-learn package is pretty mature and should work well: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html (function fit_predict of the KMeans classifier)
k is an input and denotes the number of clusters you want. The algorithm will split the blobs (samples) into k clusters (sets). You could then compute the x,y frame (min-x,max-x,min-y,max-y) of each set and also include the blobs individual sizes or just take their max if they are fairly small.
You could then sort the clusters by their #blobs/frame-area ratio and sum them e.g. until either enough blobs are covered (done) - or your total area becomes too big (in that case re-run with larger k).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With