finding similar images from a database using image only not via tag

Question

Well the question is simple i want to find similar images given a query image, similar to what TinEye does. Suppose I have a shirt with the following description

Sleeve length : full

collar : present

pattern : striped

(The above data is just to give you a feel of image i actually dont have this data)

query similarOutput1 similaraOutput2 similarOutput3

First image is the query image and the next should be the output of the similarity finding algorithm. So based on the example we have a flexibility like we can show the user an image with a changed color, we can see all the images have the same pattern, the same collar type or sleeve length. So i have to show the output which are visually similar.

There are similar thread on stack also link from stack and not only this but there are many other. But i am confused about the approach to follow.

In my case i dont have to search in another category I have to search in the same category like if the input is shirt i will search in the shirt category only. That part has been done.

So the question is what are the approaches to handle this problem. for the color it is no big issue. Color information can be easily extracted through color histogram. Lets say the input is TShirt round neck i.e. without collar, half sleeve and printed at center with text. Now the output should be images similar to those like half sleeve, round collar, and printed text at center. thought the text may vary. I tried K-Means clustering and P-hash but that didnt work. Please enlighten me

PS : I have to find similar images not duplicates.

cyriel · Accepted Answer

I would try to split this problem into 3 smaller problems:

checking whether image shows shirt with long or short sleevs
checking pattern (stipped, plain, something else?)
determining color of shirt

Checking whether image shows shirt with long or short sleevs
This one is in my opinion the easiest. You mentioned that you have category name, but basing on google graphics it seems that it may not be obvious whether Shirt or TShirt has long or short sleevs.
My solution is quite simple:

Find face on image
Use grabcut algorithm to extract face mask from image
Mask face (so after this step only face is left - everythin else is black). Note that this step is not necessary - i've mentioned it only, because it's shown on final image.
Convert image to HSV color space
Using face mask calculate histogram for H and S color channels of FACE ONLY (without rest of the image)
Calculate back projection of hsv image using histogram from previous step. Thanks for that you will get only regions which color (in HSV) is similar to color of face - so you will get only regions which contains skin.
Threshold the result (there is always some noise :) )

The final result of this algorithm is black and white image which shows skin regions. Using this image you can calculate number of pixels with skin and check whether skin is only on face or maybe somewhere else. You can try to find contours as well - generally both solututions will give chance to check if hands are visible. Yes - shirt has short sleevs, no - long sleevs.

Here are the results (from top-left corner - original image, face mask (result of grabcut algorithm), masked face, hsv image, result of calculating back projection, result of using threshold on previous one): enter image description here
As you can see, unfortunetely it fails for image 3, because face is in very similar color to shirt pattern (and generally face color is quite close to white - something is wrong with this guy, he should spend more time outside ;) ).

Source is quite simple, but if you don't understand something feel free to ask:

import cv2
import numpy as np


def process_image(img, face_pos, title):
    if len(face_pos) == 0:
        print 'No face found!'
        return
    mask = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8) #create mask with the same size as image, but only one channel. Mask is initialized with zeros
    cv2.grabCut(img, mask, tuple(face_pos[0]), np.zeros((1,65), dtype=np.float64), np.zeros((1,65), dtype=np.float64), 1, cv2.GC_INIT_WITH_RECT) #use grabcut algorithm to find mask of face. See grabcut description for more details (it's quite complicated algorithm)
    mask = np.where((mask==1) + (mask==3), 255, 0).astype('uint8') #set all pixels == 1 or == 3 to 255, other pixels set to 0
    img_masked = cv2.bitwise_and(img, img, mask=mask) #create masked image - just to show the result of grabcut
    #show images
    cv2.imshow(title, mask) 
    cv2.imshow(title+' masked', img_masked)

    img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) #convert image to hsv
    channels = [0,1]
    channels_ranges = [180, 256]
    channels_values = [0, 180, 0, 256]
    histogram = cv2.calcHist([img_hsv], channels, mask, channels_ranges, channels_values) #calculate histogram of H and S channels
    histogram = cv2.normalize(histogram, None, 0, 255, cv2.NORM_MINMAX) #normalize histogram

    dst = cv2.calcBackProject([img_hsv], channels, histogram, channels_values, 1) # calculate back project (find all pixels with color similar to color of face)
    cv2.imshow(title + ' calcBackProject raw result', dst)

    ret, thresholded = cv2.threshold(dst, 25, 255, cv2.THRESH_BINARY) #threshold result of previous step (remove noise etc)
    cv2.imshow(title + ' thresholded', thresholded)

    cv2.waitKey(5000)
    #put partial results into one final image
    row1 = np.hstack((img, cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR), img_masked))
    row2 = np.hstack((img_hsv, cv2.cvtColor(dst, cv2.COLOR_GRAY2BGR), cv2.cvtColor(thresholded, cv2.COLOR_GRAY2BGR)))
    return np.vstack((row1, row2))


paths = ['1.jpg', '2.jpg', '3.jpg', '4.jpg']
haar_cascade = cv2.CascadeClassifier('C:\DevTools\src\opencv\data\haarcascades\haarcascade_frontalface_default.xml') #change it to path to face cascade - it's inside opencv folder

for path in paths:
    img = cv2.imread(path)
    face_pos = haar_cascade.detectMultiScale(img, 1.3, 5, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    if len(face_pos) == 0: #if haar cascade failed to find any face, try again with different (more accurate, but slower) settings
        face_pos = haar_cascade.detectMultiScale(img, 1.1, 3, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    result = process_image(img, face_pos, path)
    cv2.imwrite('result_' + path, result) #save the result

Checking pattern (stipped, plain, something else?) and determining color of shirt
Here i would try to extract (mask) the shirt from the image and than operate only on it. To achieve it i would try to use similar approach as in previous part - grabcut algorithm. Initializing it might be harder this time. Quite easy (but probably not perfect) solution which comes to my mind is:

set rect around almost whole area (leave just few pixels on each side)
initialize mask to sure foreground value in the middle of the image - just draw some circle in the middle using sure foregroung "color"
set mask to sure background in the corners of the image
set mask to sure background in face rectangle (the one founded using Haar cascade in step "Checking whether image shows shirt with long or short sleevs")

Alternatively you can initialize whole mask as sure foreground or possible foreground and use watershed algorithm to find the big white area (which is backgrund). Once you have this area - use it as background.
Most likely using those 2 solutions together will give you the best results.

You can try much easier solution as well. It looks like all images has got SHIRT not background, skin or anything else sligtly over the center of it. Just like here: enter image description here so you can just analyze only this part of shirt. You can try to localize this sure shirt part of image using Haar cascade as well - just find face and than move the founded rectangle down.

Once you have masked shirt you can calculate its parameters. 2 things which i would try are:

convert it to HSV color space and calculate histograms (2 separates - not one as we did in previous step) for Hue and for Saturations channels. Comparing those histograms for 2 shirts should give you chance to find shirt with similar colors. For comparing histogram i would use some (normalized) correlation coefficients.
use Fourier transform to see what frequencies are the most common in this shirt. For plain shirts it should be much smaller frequencies than for stripped.

I know that those solutons aren't perfect, but hope it helps. If you will have any problems or questions - feel free to ask.

//edit:
I've done some simple pattern comparision using Fourier transform. Results are... not very good, not very bad - better than nothing, but definitely not perfect ;) I would say it's good point to start.
Package with code and images (yours + some from google) is here. Code:

import cv2
import numpy as np
from collections import OrderedDict
import operator


def shirt_fft(img, face_pos, title):
    shirt_rect_pos = face_pos[0]
    # print shirt_rect_pos
    shirt_rect_pos[1] += 2*shirt_rect_pos[3] #move down (by 2 * its height) rectangle with face - now it will point shirt sample
    shirt_sample = img[shirt_rect_pos[1]:shirt_rect_pos[1]+shirt_rect_pos[3], shirt_rect_pos[0]:shirt_rect_pos[0]+shirt_rect_pos[2]].copy() #crop shirt sample from image
    shirt_sample = cv2.resize(shirt_sample, dsize=(256, 256)) #resize sample to (256,256)
    # cv2.imshow(title+' shirt sample', shirt_sample)

    shirt_sample_gray = cv2.cvtColor(shirt_sample, cv2.COLOR_BGR2GRAY) #convert to gray colorspace

    f = np.fft.fft2(shirt_sample_gray) #calculate fft
    fshift = np.fft.fftshift(f) #shift - now the brightest poitn will be in the middle
    # fshift = fshift.astype(np.float32)
    magnitude_spectrum = 20*np.log(np.abs(fshift)) # calculate magnitude spectrum (it's easier to show)
    print magnitude_spectrum.max(), magnitude_spectrum.min(), magnitude_spectrum.mean(), magnitude_spectrum.dtype
    magnitude_spectrum = cv2.normalize(magnitude_spectrum, alpha=255.0, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8UC1) #normalize the result and convert to 8uc1 (1 channe with 8 bits - unsigned char) datatype
    print magnitude_spectrum.max(), magnitude_spectrum.min(), magnitude_spectrum.mean(), magnitude_spectrum.dtype
    # cv2.imshow(title+' fft magnitude', magnitude_spectrum)
    magnitude_spectrum_original = magnitude_spectrum.copy()
    # temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, magnitude_spectrum.max()*0.75, 255.0, cv2.THRESH_TOZERO)
    # temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 125, 255.0, cv2.THRESH_TOZERO)
    # temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 250, 255.0, cv2.THRESH_TOZERO_INV) #clear the brightest part
    temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 200, 255.0, cv2.THRESH_TOZERO) #clear all values from 0 to 200 - removes noise etc
    # cv2.imshow(title+' fft magnitude thresholded', magnitude_spectrum)
    # cv2.waitKey(1)

    # if chr(cv2.waitKey(5000)) == 'q':
        # quit()

    # return fshift
    return shirt_sample_gray, magnitude_spectrum_original, magnitude_spectrum

paths = ['1.jpg', '2.jpg', '3.jpg', '4.jpg', 'plain1.jpg', 'plain2.jpg', 'plain3.jpg', 'plain4.jpg', 'stripes1.jpg', 'stripes2.jpg']
haar_cascade = cv2.CascadeClassifier('C:\DevTools\src\opencv\data\haarcascades\haarcascade_frontalface_default.xml') #change it to path to face cascade - it's inside opencv folder

fft_dict = OrderedDict()
results_img = None

for path in paths:
    img = cv2.imread(path)
    face_pos = haar_cascade.detectMultiScale(img, 1.3, 5, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    if len(face_pos) == 0: #if haar cascade failed to find any face, try again with different (more accurate, but slower) settings
        face_pos = haar_cascade.detectMultiScale(img, 1.1, 3, cv2.CASCADE_FIND_BIGGEST_OBJECT)
    # result = process_image(img, face_pos, path)
    # cv2.imwrite('result_' + path, result) #save the result
    results = shirt_fft(img, face_pos, path)
    if results_img is None:
        results_img = np.hstack(results)
    else:
        results_img = np.vstack((results_img, np.hstack(results)))
    fft_dict[path] = results[2]

similarity_dict = {}
cv2.imshow('results_img', results_img)
cv2.waitKey(1)


#for each image calcualte value of correlation with each other image
for i in range(len(fft_dict.keys())):
    for j in range(i+1, len(fft_dict.keys())):
    # for j in range(i, len(fft_dict.keys())):
        key1, key2 = fft_dict.keys()[i], fft_dict.keys()[j]
        print 'pair: ', key1, key2 
        img1 = fft_dict[key1]
        img2 = fft_dict[key2].copy()
        # img2 = img2[10:246, 10:246]
        correlation = cv2.matchTemplate(img1, img2, cv2.TM_CCORR_NORMED)
        # correlation = cv2.matchTemplate(img1, img2, cv2.TM_SQDIFF_NORMED)
        # print correlation
        print correlation.shape, correlation.dtype, correlation.max()
        similarity_dict[key1 + ' - ' + key2] = correlation.max()
        # similarity_dict[key1 + ' - ' + key2] = correlation

#sort values (from best to worst matches)
sorted_similarity_dict = sorted(similarity_dict.items(), key=operator.itemgetter(1), reverse=True)
print "final result: "
for a in sorted_similarity_dict:
    print a


cv2.waitKey(50000)

Some lines are commented - you can try to use them, maybe you will achieve better results.
Basic algorithm is quite simple - for each image:

cut shirt sample from image (just move down rectangle with face by 2* it height)
convert this rect to gray colorspace and resize to (256, 256)
calculate fft of this sample
calculate magnite spectrum of fft transform
normalize it (from 0 to 255)
threshold it (clear all values <200) - this will remove noise etc.

Now we can calculate normalized cross corelation of this image between all shirt samples. High result -> similar samples. Final results:

('plain1.jpg - plain3.jpg', 1.0)  
('plain3.jpg - plain4.jpg', 1.0)  
('plain1.jpg - plain4.jpg', 1.0)  
('stripes1.jpg - stripes2.jpg', 0.54650664)  
('1.jpg - 3.jpg', 0.52512592)  
('plain1.jpg - stripes1.jpg', 0.45395589)  
('plain3.jpg - stripes1.jpg', 0.45395589)  
('plain4.jpg - stripes1.jpg', 0.45395589)  
('plain1.jpg - plain2.jpg', 0.39764369)  
('plain2.jpg - plain4.jpg', 0.39764369)  
('plain2.jpg - plain3.jpg', 0.39764369)  
('2.jpg - stripes1.jpg', 0.36927304)  
('2.jpg - plain3.jpg', 0.35678366)  
('2.jpg - plain4.jpg', 0.35678366)  
('2.jpg - plain1.jpg', 0.35678366)  
('1.jpg - plain1.jpg', 0.28958824)  
('1.jpg - plain3.jpg', 0.28958824)  
('1.jpg - plain4.jpg', 0.28958824)  
('2.jpg - 3.jpg', 0.27775836)  
('4.jpg - plain3.jpg', 0.2560707)  
('4.jpg - plain1.jpg', 0.2560707)  
('4.jpg - plain4.jpg', 0.2560707)  
('3.jpg - stripes1.jpg', 0.25498456)  
('4.jpg - plain2.jpg', 0.24522379)  
('1.jpg - 2.jpg', 0.2445447)  
('plain4.jpg - stripes2.jpg', 0.24032137)  
('plain3.jpg - stripes2.jpg', 0.24032137)  
('plain1.jpg - stripes2.jpg', 0.24032137)  
('3.jpg - stripes2.jpg', 0.23217434)  
('plain2.jpg - stripes2.jpg', 0.22518013)  
('2.jpg - stripes2.jpg', 0.19549081)  
('plain2.jpg - stripes1.jpg', 0.1805127)  
('3.jpg - plain4.jpg', 0.14908621)  
('3.jpg - plain1.jpg', 0.14908621)  
('3.jpg - plain3.jpg', 0.14908621)  
('4.jpg - stripes2.jpg', 0.14738286)  
('2.jpg - plain2.jpg', 0.14187276)  
('3.jpg - 4.jpg', 0.13638313)  
('1.jpg - stripes1.jpg', 0.13146029)  
('4.jpg - stripes1.jpg', 0.11624481)  
('1.jpg - plain2.jpg', 0.11515292)  
('2.jpg - 4.jpg', 0.091361843)  
('1.jpg - 4.jpg', 0.074155055)  
('1.jpg - stripes2.jpg', 0.069594234)  
('3.jpg - plain2.jpg', 0.059283193)

Image with all the shirt samples, magnitude spectrums (before and after threshold) is here: enter image description here

The images names are (in the same order as samples on this big image): ['1.jpg', '2.jpg', '3.jpg', '4.jpg', 'plain1.jpg', 'plain2.jpg', 'plain3.jpg', 'plain4.jpg', 'stripes1.jpg', 'stripes2.jpg'] As you can see, thresholded images are quite similar for samples with same pattern. I think that this solution could work better if you just find a better way to compare those images (thresholded magnitude spectrums).

edit2:
Just a simple idea - after you crop shirt samples from lot of shirts, you can try to train some classifier and than recognize patterns them using this classifier. Look for tutorials about training Haar or LBP(local binary pattern) cascades.

finding similar images from a database using image only not via tag

Tags:

image-processing

opencv

machine-learning

deep-learning

computer-vision

iec2011007

1 Answers

cyriel

Recent Activity

Donate For Us

finding similar images from a database using image only not via tag

Tags:

image-processing

opencv

machine-learning

deep-learning

computer-vision

iec2011007

1 Answers

cyriel

Related questions

Recent Activity

Donate For Us