Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SIFT match computer vision

I need to determine the location of yogurts in the supermarket. Source photo looks like enter image description here

With template:

enter image description here

I using SIFT to extract key points of template:

img1 = cv.imread('train.jpg')
sift = cv.SIFT_create()# queryImage
kp1, des1 = sift.detectAndCompute(img1, None)
path = glob.glob("template.jpg")
cv_img = []
l=0

for img in path:
    img2 = cv.imread(img) # trainImage
    # Initiate SIFT detector

    # find the keypoints and descriptors with SIFT

    kp2, des2 = sift.detectAndCompute(img2,None)
    # FLANN parameters
    FLANN_INDEX_KDTREE = 1
    index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
    search_params = dict(checks=50)   # or pass empty dictionary
    flann = cv.FlannBasedMatcher(index_params,search_params)
    matches = flann.knnMatch(des1,des2,k=2)
    # Need to draw only good matches, so create a mask

    # ratio test as per Lowe's paper

    if (l < len(matches)):
        l = len(matches)
        image = img2
        match = matches

        
        
    h_query, w_query, _= img2.shape

    matchesMask = [[0,0] for i in range(len(match))]
    good_matches = []
    good_matches_indices = {}
    for i,(m,n) in enumerate(match):
        if m.distance < 0.7*n.distance:
            matchesMask[i]=[1,0]
            good_matches.append(m)
            good_matches_indices[len(good_matches) - 1] = i

    
    bboxes = []
    
    
    
    src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,2)
    dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,2)

    model, inliers = initialize_ransac(src_pts, dst_pts)
    
    n_inliers = np.sum(inliers)
    matched_indices = [good_matches_indices[idx] for idx in inliers.nonzero()[0]]
    
    print(len(matched_indices))
    model, inliers = ransac(
        (src_pts, dst_pts),
        AffineTransform, min_samples=4,
        residual_threshold=4, max_trials=20000
    )

    n_inliers = np.sum(inliers)
    print(n_inliers)
    matched_indices = [good_matches_indices[idx] for idx in inliers.nonzero()[0]]
    print(matched_indices)
    

    q_coordinates = np.array([(0, 0), (h_query, w_query)])
    coords = model.inverse(q_coordinates)
    print(coords)
    
    h_query, w_query,_ = img2.shape
    q_coordinates = np.array([(0, 0), (h_query, w_query)])
    coords = model.inverse(q_coordinates)
    print(coords)
#     bboxes_list.append((i, coords))
                

    M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC, 2)

    draw_params = dict(matchColor = (0,255,0),
                        singlePointColor = (255,0,0),
                        matchesMask = matchesMask,
                        flags = cv.DrawMatchesFlags_DEFAULT)

    img3 = cv.drawMatchesKnn(img1,kp1,image,kp2,match,None,**draw_params)

    plt.imshow(img3),plt.show()

Result of SIFT looks like

enter image description here

The question is what is the best way to clasterise points to obtain rectangles, representing each yogurt? I tried RANSAC, but this method doesn't work in this case.

like image 216
Ivan Anisimov Avatar asked Dec 04 '25 20:12

Ivan Anisimov


1 Answers

I am proposing an approach based on what is discussed in this paper. I have modified the approach a bit because the use-case is not entirely same but they do use SIFT features matching to locate multiple objects in video frames. They have used PCA for reducing time but that may not be required for still images.

Sorry I could not write a code for this as it will take a lot of time but I believe this should work to locate all the occurrences of the template object.

The modified approach is like this:

Divide the template image into regions: left, middle, right along the horizontal and top, bottom along the vertical

Now when you match features between the template and source image, you will get features matched from some of the keypoints from these regions on multiple locations on the source image. You can use these keypoints to identify which region of the template is present at what location(s) in the source image. If there are overlapping regions i.e. keypoints from different regions matched with close keypoints in source image then that would mean a wrong match.

Mark each set of matching keypoints within a neighborhood on source image as left, center, right, top, bottom depending upon if they have majority matches from keypoints of a particular region in the template image.

Starting from each left region on source image move towards right and if we find a central region followed by a right region then this area of source image between regions marked as left and right, can be marked as location of one template object.

There could be overlapping objects which could result in a left region followed by another left region when moving in right direction from the left region. The area between the two left regions can be marked as one template object.

For further refined locations, each area of source image marked as one template object can be cropped and re-matched with the template image.

like image 56
abggcv Avatar answered Dec 07 '25 13:12

abggcv