Image clustering by its similarity in python

2 Answers

I had the same problem and I came up with this solution:

Import a pretrained model using Keras (here VGG16)
Extract features per image
Do kmeans
Export by copying with cluster label

Here is my code, partly motivated by this post.

from keras.preprocessing import image
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
import numpy as np
from sklearn.cluster import KMeans
import os, shutil, glob, os.path
from PIL import Image as pil_image
image.LOAD_TRUNCATED_IMAGES = True 
model = VGG16(weights='imagenet', include_top=False)

# Variables
imdir = 'C:/indir/'
targetdir = "C:/outdir/"
number_clusters = 3

# Loop over files and get features
filelist = glob.glob(os.path.join(imdir, '*.jpg'))
filelist.sort()
featurelist = []
for i, imagepath in enumerate(filelist):
    print("    Status: %s / %s" %(i, len(filelist)), end="\r")
    img = image.load_img(imagepath, target_size=(224, 224))
    img_data = image.img_to_array(img)
    img_data = np.expand_dims(img_data, axis=0)
    img_data = preprocess_input(img_data)
    features = np.array(model.predict(img_data))
    featurelist.append(features.flatten())

# Clustering
kmeans = KMeans(n_clusters=number_clusters, random_state=0).fit(np.array(featurelist))

# Copy images renamed by cluster 
# Check if target dir exists
try:
    os.makedirs(targetdir)
except OSError:
    pass
# Copy with cluster name
print("\n")
for i, m in enumerate(kmeans.labels_):
    print("    Copy: %s / %s" %(i, len(kmeans.labels_)), end="\r")
    shutil.copy(filelist[i], targetdir + str(m) + "_" + str(i) + ".jpg")

Update 02/2022:

In some cases (e.g. unknown number of clusters) using Affinity Propagation may be a much better choice than kmeans. In this case, replace kmeans by:

from sklearn.cluster import AffinityPropagation
affprop = AffinityPropagation(affinity="euclidean", damping=0.5).fit(np.array(featurelist))

and loop over affprop.labels_ to access the results.

131

answered Nov 15 '22 17:11

Peter

It is a too broad question.

The main question - what your features should be. It is difficult to answer without knowing what you are trying to accomplish. If your images are small and of the same size you can simply have every pixel as a feature. If you have any metadata and would like to sort using it - you can have every tag in metadata as a feature.

Now if you really need to find some patterns between images you will have to apply an additional layer of processing, like convolutional neural network, which essentially allows you to extract features from different pieces of your image. You can think about it as a filter, which will convert every image into, say 8x8 matrix, which then correspondingly could be used as a row with 64 different features in your array X for clustering.

answered Nov 15 '22 16:11

omdv

Related questions
                            
                                Is there a Java 8 equivalent of Python enumerate built-in?
                            
                                python: can't open file 'django-admin.py': [Errno 2] No such file or directory
                            
                                How to detect if any element in a dictionary changes?
                            
                                Locally hosting Django project
                            
                                Python docx Lib Center Align image
                            
                                lambda is slower than function call in python, why
                            
                                Pydoc not seeing docstrings?
                            
                                Python: False vs 0
                            
                                Django ALLOWED_HOSTS with ELB HealthCheck
                            
                                Improper use of __new__ to generate classes?
                            
                                Pyplot / matplotlib line plot - same color
                            
                                Python's `range` function with 3 parameters
                            
                                Save form data in Django
                            
                                How do I find the number of vertices in a graph created by iGraph in python?
                            
                                Extracting a random sublist from a list in Python
                            
                                Map list from dictionaries
                            
                                Django rest framework serializer is valid always false
                            
                                AttributeError: 'str' object has no attribute 'loads', json.loads()
                            
                                How to implement a async grpc python server?
                            
                                How to get string from a django.utils.safestring.SafeText

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Image clustering by its similarity in python

Tags:

python

machine-learning

cluster-analysis

computer-vision

alex

People also ask

2 Answers

Peter

omdv

Recent Activity

Donate For Us