Compare images and remove duplicates

Tags:

I have two folders with images, they're all PNGs. One folder is a copy of the other with some images changed and some added. The filenames are the same but the image contents may be different. Other attributes like time stamps are completely random, unfortunately.

I want in the newer folder to remove the duplicates (by content) and just keep the updated and the new ones.

I installed ImageMagick to use the compare command but I can't figure it out. :-( Can you help me please? Thanks in advance!

Added: I'm on Mac OS X.

637

asked May 05 '15 17:05

user2983816

Video Answer

2 Answers

You don't say if you are on OSX/Linux or Windows, however, I can get you started. ImageMagick can calculate a hash (checksum) of all the pixel data in an image regardless of date or timestamp like this

identify -format "%# %f\n" *.png

25a3591a58550edd2cff65081eab11a86a6a62e006431c8c4393db8d71a1dfe4 blue.png
304c0994c751e75eac86bedac544f716560be5c359786f7a5c3cd6cb8d2294df green.png
466f1bac727ac8090ba2a9a13df8bfb6ada3c4eb3349087ce5dc5d14040514b5 grey.png
042a7ebd78e53a89c0afabfe569a9930c6412577fcf3bcfbce7bafe683e93e8a hue.png
d819bfdc58ac7c48d154924e445188f0ac5a0536cd989bdf079deca86abb12a0 lightness.png
b63ad69a056033a300f23c31f9425df6f469e79c2b9f3a5c515db3b52c323a65 montage.png
a42a5f0abac3bd2f6b4cbfde864342401847a120dacae63294edb45b38edd34e red.png
10bf63fd725c5e02c56df54f503d0544f14f754d852549098d5babd8d3daeb84 sample.png
e95042f227d2d7b2b3edd4c7eec05bbf765a09484563c5ff18bc8e8aa32c1a8e sat.png

So, if you do that in each folder you will have the checksums of all the files with their names beside them in a separate file for each folder.

If you then merge the two files and sort them you can find duplicates quite easily since the duplicated files will come up next to each other.

Let's say, you run the above command in two folders dira and dirb like this

cd dira
identify -format "%# %f\n" *.png > $HOME/dira

cd dirb
identify -format "%# %f\n" *.png > $HOME/dirb

Then you could do something like this in awk

awk 'FNR==NR{name[$1]=$2;next}
            { 
               if($1 in name){print $2 " duplicates " name[$1]}
            }' $HOME/dir*

So, the $HOME/dir* part passes both the files into awk. The piece in {} after FNR==NR only applies to the first file read in, and as it is read, we save an associative array indexed by the hash containing the filenames. Then, on the second pass, we check if each hash has been seen, and if it has, we say that that it is a duplicate and output the name we found on the first pass from the hash name[] and the name we found on the second pass from $2.

This won't work with filenames with spaces in them, so if that is a problem, change the identify command to put a colon between the hash and the filename like this:

identify -format "%#:%f\n" *.png

and change the awk to awk -F":" and it should work again.

158

answered Oct 30 '22 23:10

Mark Setchell

Here’s my ugly solution for Powershell (which is now a multi-platform solution) — I wrote it for a one-off but it should work. I tried to comment it a bit to compensate for how bad it is.

I’d back up your images before doing this, though. Just in case.

The catch here is that it only detects if each file is a duplicate of the previous one — if you need to check if each file is a duplicate of any other, you’ll want to nest another for() loop in there, which should be easy enough.

#get the list of files with imagemagick
#powershell handily populates $files as an array, split by line
#this will take a bit
$files = identify -format "%# %f\n" *.png

$arr = @()
foreach($line in $files) {
    #add 2 keys to the new array per line (hash and then filename)
    $arr += @($line.Split(" "))
}

#for every 2 keys (eg each hash)
for($i = 2; $i -lt $arr.Length; $i += 2) {
    #compare it to the last hash
    if($arr[$i] -eq $arr[$i-2]) {
        #print a helpful message and then delete
        echo "$($arr[$i].Substring(0,16)) = $($arr[$i-2].Substring(0,16)) (removing $($arr[$i+1]))"
        remove-item ($arr[$i+1])
    }
}

Bonus: To delete any images with a particular hash (an all black 640×480 png in my case):

for($i = 2; $i -lt $arr.Length; $i += 2) {
    if($arr[$i] -eq "f824c1a8a1128713f17dd8d1190d70e6012b509606d986e7a6c81e40b628df2b") {
        echo "$($arr[$i+1])"
        remove-item ($arr[$i+1])
    }
}

Double bonus: C code to check if a written image collides with a given hash in a hash/ folder and delete it if so — written for Windows/MinGW but shouldn’t be too hard to port if necessary. Might be superfluous but I figured I’d throw it out there in case it’s useful to anyone.

char filename[256] = "output/UNINITIALIZED.ppm";
unsigned long int timeint = time(NULL);
sprintf(filename, "../output/image%lu.ppm", timeint);
if(
    writeppm(
        filename,
        SCREEN_WIDTH,
        SCREEN_HEIGHT,
        screenSurface->pixels
        ) != 0
) {
    printf("image write error!\n");
    return;
}
char shacmd[256];
sprintf(shacmd, "sha256sum %s", filename);
FILE *file = popen(shacmd, "r");
if(file == NULL) {
    printf("failed to get image hash!\n");
    return;
}
//the hash is 64 characters but we need a 0 at the end too
char sha[96];
int i;
char c;
//get hash until the first space
for(i = 0; (i < 64) && (c != EOF) && (c != 0x32); i++) {
    sha[i] = c = fgetc(file);
}
pclose(file);

char hashfilename[256];
sprintf(hashfilename, "../output/hash/%s", sha);

if(_access(hashfilename, 0) != -1) {
    //file exists, delete img
    if(unlink(filename) != 0) {
        printf("image delete error!\n");
    }
} else {
    FILE *hashfile = fopen(hashfilename, "w");
    if(hashfile == NULL)
        printf("hash file write error!\nfilename: %s\n", hashfilename);
    fclose(hashfile);
}

answered Oct 31 '22 01:10

9999years

Related questions
                            
                                How to Run windows 8 camera api from c#
                            
                                In image processing, why is it recommended to loop over Y first and then X second?
                            
                                Do not show broken images via Onload property
                            
                                Is conversion to gray scale a necessary step in Image preprocessing?
                            
                                How To Add An Image To A Tweet With TwitterAPI?
                            
                                How to embed base64 image to an email using javamail
                            
                                PHP - imagepng not working properly
                            
                                'insufficient data for an image' message when opening PDF
                            
                                PrestaShop Images
                            
                                Javafx ComboBox disappearing items after select
                            
                                HTML5 Drag and Drop only images
                            
                                Change broken link icon in Ember
                            
                                How to convert tiff image to jpeg using r [closed]
                            
                                In HTML, how can you make an image appear while you are hovering over text?
                            
                                PHPhotoLibrary error while saving image at url
                            
                                Recommended steps for viewing image file in Github gist are not working
                            
                                iOS: Custom keyboard: I want to send images to the textDocumentProxy(Input controls)
                            
                                Intervention Image rounded corners upload
                            
                                Java + Spring Boot : Downloading image and pass it to a request
                            
                                How do I alter the position of a UIImage inside a UIImageView

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compare images and remove duplicates

Tags:

compare

image

png

imagemagick