I have a library of like 1 million images, and roughly half of these are watermarked with the same, half transparent watermark in the same spot.
Where do I begin, detecting the images with the watermarks? Is there some standard tools for this purpose?
If according to your question, you just want to detect the images that are watermarked, you can use the following algorithm:
The code could be something like this:
$no_of_pixels = what_you_got;
$matched = 0;
$thumbpixels = array();
$wmark = imagecreatefrompng("watermark.png");
list($width, $height) = getimagesize("watermark.png");
$tesimage = imagecreatefrompng("test.png");
for($h = 0; $h < $height; $h++){
for($w = 0; $w < $width; $w++){
if(imagecolorsforindex($testimage, imagecolorat($testimage, $w, $h)) == $thumbpixels[0]){
while($thumbpixels[$i++] === imagecolorsforindex($tesimage, imagecolorat($wmark, $w, $h)) && $no_of_pixels != $matched){
$matched++;
}
if($matched == $no_of_pixels) echo "Voila, we found it!";
}
}
}
Just seeing your thumbnail example. If you just want to detect text, you can try tesseract-ocr or PhpOCR.
You may also consider PHPSane
Detecting almost any feature in an image is called Object Detection. There is a widely used libray called OpenCV. It has a very simple SDK, although setting up can be a real pain. It is well supported for C/C++ and (nearly well supported for) Python. It took me arnd 3 weeks to train my own Classfier (training), first time I started using OpenCV.
But I would not really depend on this solution entirely and consider my priorities. Also, it is very hard to achieve good rate with custom classifier. Other methods are more time consuming.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With