Well the question is simple i want to find similar images given a query image, similar to what TinEye does. Suppose I have a shirt with the following description
Sleeve length : full
collar : present
pattern : striped
(The above data is just to give you a feel of image i actually dont have this data)
First image is the query image and the next should be the output of the similarity finding algorithm. So based on the example we have a flexibility like we can show the user an image with a changed color, we can see all the images have the same pattern, the same collar type or sleeve length. So i have to show the output which are visually similar.
There are similar thread on stack also link from stack and not only this but there are many other. But i am confused about the approach to follow.
In my case i dont have to search in another category I have to search in the same category like if the input is shirt i will search in the shirt category only. That part has been done.
So the question is what are the approaches to handle this problem. for the color it is no big issue. Color information can be easily extracted through color histogram. Lets say the input is TShirt round neck i.e. without collar, half sleeve and printed at center with text. Now the output should be images similar to those like half sleeve, round collar, and printed text at center. thought the text may vary. I tried K-Means clustering and P-hash but that didnt work. Please enlighten me
PS : I have to find similar images not duplicates.
I would try to split this problem into 3 smaller problems:
Checking whether image shows shirt with long or short sleevs
This one is in my opinion the easiest. You mentioned that you have category name, but basing on google graphics it seems that it may not be obvious whether Shirt or TShirt has long or short sleevs.
My solution is quite simple:
The final result of this algorithm is black and white image which shows skin regions. Using this image you can calculate number of pixels with skin and check whether skin is only on face or maybe somewhere else. You can try to find contours as well - generally both solututions will give chance to check if hands are visible. Yes - shirt has short sleevs, no - long sleevs.
Here are the results (from top-left corner - original image, face mask (result of grabcut algorithm), masked face, hsv image, result of calculating back projection, result of using threshold on previous one):
As you can see, unfortunetely it fails for image 3, because face is in very similar color to shirt pattern (and generally face color is quite close to white - something is wrong with this guy, he should spend more time outside ;) ).
Source is quite simple, but if you don't understand something feel free to ask:
import cv2
import numpy as np
def process_image(img, face_pos, title):
if len(face_pos) == 0:
print 'No face found!'
return
mask = np.zeros((img.shape[0], img.shape[1]), dtype=np.uint8) #create mask with the same size as image, but only one channel. Mask is initialized with zeros
cv2.grabCut(img, mask, tuple(face_pos[0]), np.zeros((1,65), dtype=np.float64), np.zeros((1,65), dtype=np.float64), 1, cv2.GC_INIT_WITH_RECT) #use grabcut algorithm to find mask of face. See grabcut description for more details (it's quite complicated algorithm)
mask = np.where((mask==1) + (mask==3), 255, 0).astype('uint8') #set all pixels == 1 or == 3 to 255, other pixels set to 0
img_masked = cv2.bitwise_and(img, img, mask=mask) #create masked image - just to show the result of grabcut
#show images
cv2.imshow(title, mask)
cv2.imshow(title+' masked', img_masked)
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) #convert image to hsv
channels = [0,1]
channels_ranges = [180, 256]
channels_values = [0, 180, 0, 256]
histogram = cv2.calcHist([img_hsv], channels, mask, channels_ranges, channels_values) #calculate histogram of H and S channels
histogram = cv2.normalize(histogram, None, 0, 255, cv2.NORM_MINMAX) #normalize histogram
dst = cv2.calcBackProject([img_hsv], channels, histogram, channels_values, 1) # calculate back project (find all pixels with color similar to color of face)
cv2.imshow(title + ' calcBackProject raw result', dst)
ret, thresholded = cv2.threshold(dst, 25, 255, cv2.THRESH_BINARY) #threshold result of previous step (remove noise etc)
cv2.imshow(title + ' thresholded', thresholded)
cv2.waitKey(5000)
#put partial results into one final image
row1 = np.hstack((img, cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR), img_masked))
row2 = np.hstack((img_hsv, cv2.cvtColor(dst, cv2.COLOR_GRAY2BGR), cv2.cvtColor(thresholded, cv2.COLOR_GRAY2BGR)))
return np.vstack((row1, row2))
paths = ['1.jpg', '2.jpg', '3.jpg', '4.jpg']
haar_cascade = cv2.CascadeClassifier('C:\\DevTools\\src\\opencv\\data\\haarcascades\\haarcascade_frontalface_default.xml') #change it to path to face cascade - it's inside opencv folder
for path in paths:
img = cv2.imread(path)
face_pos = haar_cascade.detectMultiScale(img, 1.3, 5, cv2.CASCADE_FIND_BIGGEST_OBJECT)
if len(face_pos) == 0: #if haar cascade failed to find any face, try again with different (more accurate, but slower) settings
face_pos = haar_cascade.detectMultiScale(img, 1.1, 3, cv2.CASCADE_FIND_BIGGEST_OBJECT)
result = process_image(img, face_pos, path)
cv2.imwrite('result_' + path, result) #save the result
Checking pattern (stipped, plain, something else?) and determining color of shirt
Here i would try to extract (mask) the shirt from the image and than operate only on it. To achieve it i would try to use similar approach as in previous part - grabcut algorithm. Initializing it might be harder this time. Quite easy (but probably not perfect) solution which comes to my mind is:
sure foreground
value in the middle of the image - just draw some circle in the middle using sure foregroung
"color"sure background
in the corners of the image sure background
in face rectangle (the one founded using Haar cascade in step "Checking whether image shows shirt with long or short sleevs")Alternatively you can initialize whole mask as sure foreground
or possible foreground
and use watershed algorithm to find the big white area (which is backgrund). Once you have this area - use it as background.
Most likely using those 2 solutions together will give you the best results.
You can try much easier solution as well. It looks like all images has got SHIRT not background, skin or anything else sligtly over the center of it. Just like here: so you can just analyze only this part of shirt. You can try to localize this sure shirt part of image using Haar cascade as well - just find face and than move the founded rectangle down.
Once you have masked shirt you can calculate its parameters. 2 things which i would try are:
I know that those solutons aren't perfect, but hope it helps. If you will have any problems or questions - feel free to ask.
//edit:
I've done some simple pattern comparision using Fourier transform. Results are... not very good, not very bad - better than nothing, but definitely not perfect ;) I would say it's good point to start.
Package with code and images (yours + some from google) is here. Code:
import cv2
import numpy as np
from collections import OrderedDict
import operator
def shirt_fft(img, face_pos, title):
shirt_rect_pos = face_pos[0]
# print shirt_rect_pos
shirt_rect_pos[1] += 2*shirt_rect_pos[3] #move down (by 2 * its height) rectangle with face - now it will point shirt sample
shirt_sample = img[shirt_rect_pos[1]:shirt_rect_pos[1]+shirt_rect_pos[3], shirt_rect_pos[0]:shirt_rect_pos[0]+shirt_rect_pos[2]].copy() #crop shirt sample from image
shirt_sample = cv2.resize(shirt_sample, dsize=(256, 256)) #resize sample to (256,256)
# cv2.imshow(title+' shirt sample', shirt_sample)
shirt_sample_gray = cv2.cvtColor(shirt_sample, cv2.COLOR_BGR2GRAY) #convert to gray colorspace
f = np.fft.fft2(shirt_sample_gray) #calculate fft
fshift = np.fft.fftshift(f) #shift - now the brightest poitn will be in the middle
# fshift = fshift.astype(np.float32)
magnitude_spectrum = 20*np.log(np.abs(fshift)) # calculate magnitude spectrum (it's easier to show)
print magnitude_spectrum.max(), magnitude_spectrum.min(), magnitude_spectrum.mean(), magnitude_spectrum.dtype
magnitude_spectrum = cv2.normalize(magnitude_spectrum, alpha=255.0, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8UC1) #normalize the result and convert to 8uc1 (1 channe with 8 bits - unsigned char) datatype
print magnitude_spectrum.max(), magnitude_spectrum.min(), magnitude_spectrum.mean(), magnitude_spectrum.dtype
# cv2.imshow(title+' fft magnitude', magnitude_spectrum)
magnitude_spectrum_original = magnitude_spectrum.copy()
# temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, magnitude_spectrum.max()*0.75, 255.0, cv2.THRESH_TOZERO)
# temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 125, 255.0, cv2.THRESH_TOZERO)
# temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 250, 255.0, cv2.THRESH_TOZERO_INV) #clear the brightest part
temp, magnitude_spectrum = cv2.threshold(magnitude_spectrum, 200, 255.0, cv2.THRESH_TOZERO) #clear all values from 0 to 200 - removes noise etc
# cv2.imshow(title+' fft magnitude thresholded', magnitude_spectrum)
# cv2.waitKey(1)
# if chr(cv2.waitKey(5000)) == 'q':
# quit()
# return fshift
return shirt_sample_gray, magnitude_spectrum_original, magnitude_spectrum
paths = ['1.jpg', '2.jpg', '3.jpg', '4.jpg', 'plain1.jpg', 'plain2.jpg', 'plain3.jpg', 'plain4.jpg', 'stripes1.jpg', 'stripes2.jpg']
haar_cascade = cv2.CascadeClassifier('C:\\DevTools\\src\\opencv\\data\\haarcascades\\haarcascade_frontalface_default.xml') #change it to path to face cascade - it's inside opencv folder
fft_dict = OrderedDict()
results_img = None
for path in paths:
img = cv2.imread(path)
face_pos = haar_cascade.detectMultiScale(img, 1.3, 5, cv2.CASCADE_FIND_BIGGEST_OBJECT)
if len(face_pos) == 0: #if haar cascade failed to find any face, try again with different (more accurate, but slower) settings
face_pos = haar_cascade.detectMultiScale(img, 1.1, 3, cv2.CASCADE_FIND_BIGGEST_OBJECT)
# result = process_image(img, face_pos, path)
# cv2.imwrite('result_' + path, result) #save the result
results = shirt_fft(img, face_pos, path)
if results_img is None:
results_img = np.hstack(results)
else:
results_img = np.vstack((results_img, np.hstack(results)))
fft_dict[path] = results[2]
similarity_dict = {}
cv2.imshow('results_img', results_img)
cv2.waitKey(1)
#for each image calcualte value of correlation with each other image
for i in range(len(fft_dict.keys())):
for j in range(i+1, len(fft_dict.keys())):
# for j in range(i, len(fft_dict.keys())):
key1, key2 = fft_dict.keys()[i], fft_dict.keys()[j]
print 'pair: ', key1, key2
img1 = fft_dict[key1]
img2 = fft_dict[key2].copy()
# img2 = img2[10:246, 10:246]
correlation = cv2.matchTemplate(img1, img2, cv2.TM_CCORR_NORMED)
# correlation = cv2.matchTemplate(img1, img2, cv2.TM_SQDIFF_NORMED)
# print correlation
print correlation.shape, correlation.dtype, correlation.max()
similarity_dict[key1 + ' - ' + key2] = correlation.max()
# similarity_dict[key1 + ' - ' + key2] = correlation
#sort values (from best to worst matches)
sorted_similarity_dict = sorted(similarity_dict.items(), key=operator.itemgetter(1), reverse=True)
print "final result: "
for a in sorted_similarity_dict:
print a
cv2.waitKey(50000)
Some lines are commented - you can try to use them, maybe you will achieve better results.
Basic algorithm is quite simple - for each image:
Now we can calculate normalized cross corelation of this image between all shirt samples. High result -> similar samples. Final results:
('plain1.jpg - plain3.jpg', 1.0)
('plain3.jpg - plain4.jpg', 1.0)
('plain1.jpg - plain4.jpg', 1.0)
('stripes1.jpg - stripes2.jpg', 0.54650664)
('1.jpg - 3.jpg', 0.52512592)
('plain1.jpg - stripes1.jpg', 0.45395589)
('plain3.jpg - stripes1.jpg', 0.45395589)
('plain4.jpg - stripes1.jpg', 0.45395589)
('plain1.jpg - plain2.jpg', 0.39764369)
('plain2.jpg - plain4.jpg', 0.39764369)
('plain2.jpg - plain3.jpg', 0.39764369)
('2.jpg - stripes1.jpg', 0.36927304)
('2.jpg - plain3.jpg', 0.35678366)
('2.jpg - plain4.jpg', 0.35678366)
('2.jpg - plain1.jpg', 0.35678366)
('1.jpg - plain1.jpg', 0.28958824)
('1.jpg - plain3.jpg', 0.28958824)
('1.jpg - plain4.jpg', 0.28958824)
('2.jpg - 3.jpg', 0.27775836)
('4.jpg - plain3.jpg', 0.2560707)
('4.jpg - plain1.jpg', 0.2560707)
('4.jpg - plain4.jpg', 0.2560707)
('3.jpg - stripes1.jpg', 0.25498456)
('4.jpg - plain2.jpg', 0.24522379)
('1.jpg - 2.jpg', 0.2445447)
('plain4.jpg - stripes2.jpg', 0.24032137)
('plain3.jpg - stripes2.jpg', 0.24032137)
('plain1.jpg - stripes2.jpg', 0.24032137)
('3.jpg - stripes2.jpg', 0.23217434)
('plain2.jpg - stripes2.jpg', 0.22518013)
('2.jpg - stripes2.jpg', 0.19549081)
('plain2.jpg - stripes1.jpg', 0.1805127)
('3.jpg - plain4.jpg', 0.14908621)
('3.jpg - plain1.jpg', 0.14908621)
('3.jpg - plain3.jpg', 0.14908621)
('4.jpg - stripes2.jpg', 0.14738286)
('2.jpg - plain2.jpg', 0.14187276)
('3.jpg - 4.jpg', 0.13638313)
('1.jpg - stripes1.jpg', 0.13146029)
('4.jpg - stripes1.jpg', 0.11624481)
('1.jpg - plain2.jpg', 0.11515292)
('2.jpg - 4.jpg', 0.091361843)
('1.jpg - 4.jpg', 0.074155055)
('1.jpg - stripes2.jpg', 0.069594234)
('3.jpg - plain2.jpg', 0.059283193)
Image with all the shirt samples, magnitude spectrums (before and after threshold) is here:
The images names are (in the same order as samples on this big image): ['1.jpg', '2.jpg', '3.jpg', '4.jpg', 'plain1.jpg', 'plain2.jpg', 'plain3.jpg', 'plain4.jpg', 'stripes1.jpg', 'stripes2.jpg']
As you can see, thresholded images are quite similar for samples with same pattern. I think that this solution could work better if you just find a better way to compare those images (thresholded magnitude spectrums).
edit2:
Just a simple idea - after you crop shirt samples from lot of shirts, you can try to train some classifier and than recognize patterns them using this classifier. Look for tutorials about training Haar or LBP(local binary pattern) cascades.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With