So I often run huge double-sided scan jobs on an unintelligent Canon multifunction, which leaves me with a huge folder of JPEGs. Am I insane to consider using PIL to analyze a folder of images to detect scans of blank pages and flag them for deletion?
Leaving the folder-crawling and flagging parts out, I imagine this would look something like:
I know this is sort of an edge case, but can anyone with PIL experience lend some pointers?
There are two main causes for this issue: It could be the scanner bulb temperature is too low or there is an internal or external electrical malfunction.
How do you know if an image is black in Python? You can use opencv for color detection and numpy too. Then. The printed answer will be in bgr if it is (255,255,255) its white, if its (0,0,0) its black.
PIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. PIL. Image. new() method creates a new image with the given mode and size. Size is given as a (width, height)-tuple, in pixels.
Here is an alternative solution, using mahotas and milk.
positives/
and negatives/
where you will manually pick out a few examples.unlabeled/
directory In the code below I used jug to give you the possibility of running it on multiple processors, but the code also works if you remove every line which mentions TaskGenerator
from glob import glob
import mahotas
import mahotas.features
import milk
from jug import TaskGenerator
@TaskGenerator
def features_for(imname):
img = mahotas.imread(imname)
return mahotas.features.haralick(img).mean(0)
@TaskGenerator
def learn_model(features, labels):
learner = milk.defaultclassifier()
return learner.train(features, labels)
@TaskGenerator
def classify(model, features):
return model.apply(features)
positives = glob('positives/*.jpg')
negatives = glob('negatives/*.jpg')
unlabeled = glob('unlabeled/*.jpg')
features = map(features_for, negatives + positives)
labels = [0] * len(negatives) + [1] * len(positives)
model = learn_model(features, labels)
labeled = [classify(model, features_for(u)) for u in unlabeled]
This uses texture features, which is probably good enough, but you can play with other features in mahotas.features
if you'd like (or try mahotas.surf
, but that gets more complicated). In general, I have found it hard to do classification with the sort of hard thresholds you are looking for unless the scanning is very controlled.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With