What are the ways to check if a PDF file is colored or grayscale or black/white?
You can use Ghostscript's inkcov
device to get color information about each PDF page. Here is an example command for a sample PDF (cmyk.pdf) of mine with its output:
gs -o - -sDEVICE=inkcov cmyk.pdf
GPL Ghostscript 9.10 (2013-08-30)
Processing pages 1 through 5.
Page 1
0.00000 0.00000 0.00000 0.02231 CMYK OK
Page 2
0.02360 0.02360 0.02360 0.02360 CMYK OK
Page 3
0.02525 0.02525 0.02525 0.00000 CMYK OK
Page 4
0.00000 0.00000 0.00000 0.01983 CMYK OK
Page 5
0.13274 0.13274 0.13274 0.03355 CMYK OK
If you add the -q
parameter, the result is this:
gs -q -o - -sDEVICE=inkcov cmyk.pdf
0.00000 0.00000 0.00000 0.02231 CMYK OK
0.02360 0.02360 0.02360 0.02360 CMYK OK
0.02525 0.02525 0.02525 0.00000 CMYK OK
0.00000 0.00000 0.00000 0.01983 CMYK OK
0.13274 0.13274 0.13274 0.03355 CMYK OK
How to interprete these numbers?
0.00000
represents zero color used.1.00000
would mean 100% coverage with the respective color for the sheet.
The value of 0.02360
for each single ink color on page 2 means: each color covers 2.36% of the full page (including Black).You can see the values for page 1: the same value, 0.00000
, for Cyan, Magenta and Yellow, but 0.02231
for Black. This means: page 1 uses black ink only, and 2.231 % of the pages area is covered by black ink.
Take page 2: here each of the 4 inks is given with a value of 0.02360
. Each ink is covering 2.36 % of the full page.
Look also at the values for page 3: 0.02525
for C, M and Y and 0.00000
for Black. So this page does not use black ink at all, but uses the same mount of each colored ink to cover an identically sized area of 2.525 % of the full page.
Page 4: result is similar to page 1.
Page 5: See yourself...
Caveats:
inkcov
device does always print CMYK values, never RGB values. The reason for this is that it converts all RGB color shades into CMYK before analysing the color coverage of pages. This of course introduces some inaccuracies (which you have to take into account before your rely on this tool).inkcov
device.The following picture roughly reproduces the 5 PDF pages of above used cmyk.pdf
. This should give you an approximate impression how they look like in a PDF viewer. It should make it easier to comprehend how the different values for the ink coverage quoted above do add up:
Here is the Ghostscript command that I originally used to create the above used cmyk.pdf
:
gs \
-o cmyk.pdf \
-sDEVICE=pdfwrite \
-g5950x2105 \
-c "/F1 {100 100 moveto /Helvetica findfont 42 scalefont setfont} def" \
-c "F1 (100% 'pure' black) show showpage" \
-c "F1 .5 .5 .5 setrgbcolor (50% 'rich' rgbgray) show showpage" \
-c "F1 .5 .5 .5 0 setcmykcolor (50% 'rich' cmykgray) show showpage" \
-c "F1 .5 setgray (50% 'pure' gray) show showpage" \
-c " 1 0 0 0 setcmykcolor 100 130 64 64 rectfill" \
-c " 0 1 0 0 setcmykcolor 200 130 64 64 rectfill" \
-c " 0 0 1 0 setcmykcolor 300 130 64 64 rectfill" \
-c " 0 0 0 1 setcmykcolor 400 130 64 64 rectfill" \
-c " 0 1 1 0 setcmykcolor 100 30 64 64 rectfill" \
-c " 1 0 1 0 setcmykcolor 200 30 64 64 rectfill" \
-c " 1 1 0 0 setcmykcolor 300 30 64 64 rectfill" \
-c " 1 1 1 0 setcmykcolor 400 30 64 64 rectfill showpage"
The traditional way of doing this would be to use a preflight tool such as the tools from callas software (Caution: I'm associated with this company). But if this aspect of the PDF is the only aspect you want to check, that's probably going to be overkill.
I would think that the second possible approach would be to use a tool that can convert a PDF to images and then analyse the images (convert to a CMYK image - then see if there is anything on the C, M or Y channels in that generated image).
Amyn,
This is Mohammad from LEADTOOLS support. I noticed that you posted a similar question on our LEADTOOLS support forums. I have already posted a reply there and here is a slightly modified copy of that reply:
/******************************************/
If the PDF page contains only black text on white background, loading it using the default settings will produce gray shades around the text edges to give them better smooth display as shown in attached image.
If you want such black text to be rasterized as pure black without gray shades, change the settings before loading using LEADTOOLS v18 as follows:
Set the UsePdfEngine property of the loading PDF options to true like this:
RasterCodecs.Options.Pdf.Load.UsePdfEngine = true;
Set the TextAlpha property of the loading PDF options to 1 like this:
RasterCodecs.Options.Pdf.Load.TextAlpha = 1;
Load the PDF file using default bits per pixel (24-bits):
RasterCodecs.Load("BlackTextWhiteBackground.pdf");
Count the unique colors in the file using the ColorCountCommand Class function. If the number of colors is more than two, the image will not be black and white. This could happen if it contains non-black text or other color images or graphics objects:
ColorCountCommand MyCommand = new ColorCountCommand(); MyCommand.Run(_viewer.Image);
Make sure that the "Leadtools.PdfEngine.dll" is placed in the output folder of your project (next to the EXE).
/******************************************/
Edit to answer comment about detecting gray page:
It is possible to tell whether the page is color or purely shades of gray. Add the following code after loading as 24-bits and counting the colors:
if (MyCommand.ColorCount > 2 && MyCommand.ColorCount <= 256) //could be gray
{
ColorResolutionCommand colorRes = new ColorResolutionCommand(ColorResolutionCommandMode.InPlace, 8,
RasterByteOrder.Bgr,RasterDitheringMethod.None, ColorResolutionCommandPaletteFlags.Optimized, null);
colorRes.Run(_viewer.Image);
if(_viewer.Image.GrayscaleMode == RasterGrayscaleMode.None)
MessageBox.Show("image is NOT grayscale");
else
MessageBox.Show("image is grayscale, its mode is: " + _viewer.Image.GrayscaleMode);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With