Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for optimal placement of cropped regions to capture features (blobs) in an image

I have a large image which contains "blobs" of interest over a background. I have the position (centroid, bounding box, area) of all blobs. I want to crop a limited number of regions of a fixed size in the image that allow me to capture most of the blobs. Example below for 1, 2 or 3 cropped regions in the same image.

This example shows that cropping 1 region (in red) is relatively easy: just pick the region with as many blobs as possible. This can be solved by just trying everything or possibly by computing a density of blobs using a kernel density estimator or something like this.

But cropping 2 regions (in dashed blue) is not just cropping the next best crop after the first selected above. It is a new problem where I need to find the optimal combination of 2 crops. Trying all combinations of 2 crops (brute force) probably becomes too computationally expensive (I have many images to process and they are large).

Similarly, cropping 3 regions (in green) is a new problem, and one for which brute force is even less suited. In that particular example 2 of the 3 regions are identical to the blue case and a new one is added but this is not the general case (I just wanted to show a slightly complex scenario).

I have no idea regarding the algorithm to solve the n-crops case. I am wondering if there is a theoretical/well known solution to this problem.

In addition:

  • the geometry of the problem is approximately that of the example above (maximum two crops over the height of the image, many crops over the width); in case that simplifies things
  • crops should not intersect
  • blobs should be as centred as possible in the crop (i.e. https://dl.dropboxusercontent.com/u/1047321/SO_crop/cutout_one_blob.png )
  • crops should stay within the original image boundaries (cf. either example above)
  • the area of the blob should be taken into account (I am more interested in large blobs than small ones); but that can probably be introduced in any algorithm by associating a weight with each blob to compute the score of each cropping layout.
  • it is OK to leave some blobs out. Actually, I'll probably compute a cost-complexity parameter such as how many new blobs adding a crop would get and set a threshold under which I stop adding crops.

Thanks in advance for any pointer.

PS: The coding language does not really matter here since the core of the algorithm (finding the optimal position of the crops given the location/size of blobs) only needs small arrays (position/size of ~100 blobs per image) to be computed. I'll probably use Python or R.

like image 975
Jean-Olivier Irisson Avatar asked Apr 21 '15 15:04

Jean-Olivier Irisson


1 Answers

If the blobs are relatively small as shown in the image, you could run k-Means Clustering with the blob center x,y pairs. The python scikit-learn package is pretty mature and should work well: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html (function fit_predict of the KMeans classifier)

k is an input and denotes the number of clusters you want. The algorithm will split the blobs (samples) into k clusters (sets). You could then compute the x,y frame (min-x,max-x,min-y,max-y) of each set and also include the blobs individual sizes or just take their max if they are fairly small.

You could then sort the clusters by their #blobs/frame-area ratio and sum them e.g. until either enough blobs are covered (done) - or your total area becomes too big (in that case re-run with larger k).

like image 98
midin Avatar answered Oct 10 '22 06:10

midin