I am analyzing a very big number of images and extracting the dominant color codes.
I want to group them into ranges of generic color names, like Green, Dark Green, Light Green, Blue, Dark Blue, Light Blue and so on.
I am looking for a language agnostic way in order to implement something by myself, if there are examples I can look into in order to achieve this I would be more than grateful.
In machine learning field, what you want to do is called classification, in which the goal is to assign the label of one of the classes (color) to each of the observations (images). To do this, classes must be pre-defined. Suppose these are the colors we want to assign to images:

To determine the dominant color of an image, the distance between each of its pixels and all the colors in the table must be calculated. Note that this distance is calculated in RGB color space. To calculate the distance between the ij-th pixel of the image and the k-th color of the table, the following equation can be used:
d_ijk = sqrt((r_ij-r_k)^2+(g_ij-g_k)^2+(b_ij-b_k)^2)
In the next step, for each pixel, the closest color in the table is selected. This is the concept used to compress an image using indexed colors (except that here the palette is the same for all images and is not calculated for each to minimize the difference between the original and the indexed image). Now, as @jairoar pointed out, we can get the histogram of the image (not to be confused with RGB histogram or intensity histogram), and determine the color that has the most repetition.
To show the result of these steps, I used random crops of this work of art! of mine:

This is how images look, before and after indexing (left: original, right: indexed):
And these are most repeated colors (left: indexed, right: dominant color):

But since you said the number of images is large, you should know that these calculations are relatively time consuming. But the good news is that there are ways to increase the performance. For example, instead of using the Euclidean distance (formula above), you can use the City Block or Chebyshev distance. You can also calculate the distance only for a fraction of the pixels instead of calculating it for all the pixels in an image. For this purpose, you can first scale the image to a much smaller size (for example, 32 by 32) and perform calculations for the pixels of this reduced image. If you decided to resize images, don not bother to use bilinear or bicubic interpolations, it doesn't worth the extra computation. Instead, go for the nearest neighbor, which actually performs a rectangular lattice sampling on the original image.

Although the mentioned changes will greatly increase the speed of calculations, but nothing good comes for free. This is a trade-off of performance versus accuracy. For example, in the previous two pictures, we see that the image, which was initially recognized as orange (code 20), has been recognized as pink (code 26) after resizing it.
To determine the parameters of the algorithm (distance measurement, reduced image size and scaling algorithm), you must first perform the classification operation on a number of images with the highest possible accuracy and keep the results as the ground truth. Then, with multiple experiments, obtain a combination of parameters that do not make the classification error more than a maximum tolerable value.
@saastn's fantastic answer assumes you have a set of pre-defined colors that you want to sort your images to. The implementation is easier if you just want to classify the images to one color out of some set of X equidistant colors, a la histogram.
To summarize, round the color of each pixel in the image to the nearest color out of some set of equidistant color bins. This reduces the precision of your colors down to whatever amount of colors that you desire. Then count all of the colors in the image and select the most frequent color as your classification for that image.
Here is my implementation of this in Python:
import cv2
import numpy as np
#Set this to the number of colors that you want to classify the images to
number_of_colors = 8
#Verify that the number of colors chosen is between the minimum possible and maximum possible for an RGB image.
assert 8 <= number_of_colors <= 16777216
#Get the cube root of the number of colors to determine how many bins to split each channel into.
number_of_values_per_channel = number_of_colors ** ( 1 / 3 )
#We will divide each pixel by its maximum value divided by the number of bins we want to divide the values into (minus one for the zero bin).
divisor = 255 / (number_of_values_per_channel - 1)
#load the image and convert it to float32 for greater precision. cv2 loads the image in BGR (as opposed to RGB) format.
image = cv2.imread("image.png", cv2.IMREAD_COLOR).astype(np.float32)
#Divide each pixel by the divisor defined above, round to the nearest bin, then convert float32 back to uint8.
image = np.round(image / divisor).astype(np.uint8)
#Flatten the columns and rows into just one column per channel so that it will be easier to compare the columns across the channels.
image = image.reshape(-1, image.shape[2])
#Find and count matching rows (pixels), where each row consists of three values spread across three channels (Blue column, Red column, Green column).
uniques = np.unique(image, axis=0, return_counts=True)
#The first of the two arrays returned by np.unique is an array compromising all of the unique colors.
colors = uniques[0]
#The second of the two arrays returend by np.unique is an array compromising the counts of all of the unique colors.
color_counts = uniques[1]
#Get the index of the color with the greatest frequency
most_common_color_index = np.argmax(color_counts)
#Get the color that was the most common
most_common_color = colors[most_common_color_index]
#Multiply the channel values by the divisor to return the values to a range between 0 and 255
most_common_color = most_common_color * divisor
#If you want to name each color, you could also provide a list sorted from lowest to highest BGR values comprising of
#the name of each possible color, and then use most_common_color_index to retrieve the name.
print(most_common_color)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With