Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to check duplicate images with different names using php? [closed]

Tags:

php

Is there a way to check duplicate images with different names using php ? I want to delete all duplicates.

like image 303
Chali Avatar asked May 09 '12 17:05

Chali


People also ask

How do I find duplicate photos with different names?

Click the Method dropdown list, select Visual Compare. Click the Start Scan button. It will start scanning for both duplicate photos with different names and duplicate photos with the same names. In other words, it will find duplicate pictures based on visual content, regardless of the file name.

How do I find duplicate photos in my website?

Go to www.images.google.com and click on the Camera icon in the search bar. You can next upload the image or paste the image URL in the search bar to search for similar images on the web. Click on tab “Search by image” once you have uploaded the image.


3 Answers

You can compare and check it by sha1_file hash of a file

It returns 40 character hex number

like image 63
saravanabawa Avatar answered Oct 25 '22 06:10

saravanabawa


I suppose a somewhat simple solution would be to do a checksum on the images using md5().

Open a directory, loop through the files generating md5s, compare md5s, delete duplicates.

EDIT: Here's a script using hash_file()

<?php

$dir = "/full/path/to/images";
$checksums = array();

if ($h = opendir($dir)) {
    while (($file = readdir($h)) !== false) {

        // skip directories
        if(is_dir($_="{$dir}/{$file}")) continue;

        $hash = hash_file('md5', $_);

        // delete duplicate
        if (in_array($hash, $checksums)) {
            unlink($_);
        }
        // add hash to list
        else {
            $checksums[] = $hash;
        }
    }
    closedir($h);
}
like image 21
maček Avatar answered Oct 25 '22 08:10

maček


I spent a lot of time looking for the best solution in php, but failed, read my 5 steps to heaven (or just get step #5).

  1. hash_file does not work as desired, for example in a folder of 11000 pictures with different names I know that there are only 800 unique, hash_file () found only 30 matches.

  2. I could not install a third-party library like http://libpuzzle.pureftpd.org/project/libpuzzle/php on Windows + Openserver.

  3. Tried to compare by dominant color or pixel-by-pixel ImageColorAt() , creating "digital stamp of image". It works very slow, manycoding and in final very bad - changing size or merge/rotate images are elusive.

  4. Checked Github to find readytogo solution, but there are no any solution on PHP (why? It was surprise for me).

  5. Finally, I found the shareware desktop program http://www.mindgems.com/products/VS-Duplicate-Image-Finder/VSDIF-Tutorials.htm?postinstall=1 which worked just super (fast! it works in multithreading and loads CPU to 100%, 8gb and 11000 images compared in just ~30 secs) and has all the necessary functions, exceptions and filtering. In those 11000 images dir this program founded all visual similar images, showing me groups and instances, allowing to move selected with autofilters and etc. The main disadvantage is money, but there are torrents ;)

like image 31
wtfowned Avatar answered Oct 25 '22 06:10

wtfowned