Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if PDF is colored or grayscale or black&white [closed]

Tags:

pdf

colors

What are the ways to check if a PDF file is colored or grayscale or black/white?

like image 480
amyn Avatar asked Sep 02 '14 13:09

amyn


3 Answers

You can use Ghostscript's inkcov device to get color information about each PDF page. Here is an example command for a sample PDF (cmyk.pdf) of mine with its output:

gs -o - -sDEVICE=inkcov cmyk.pdf

   GPL Ghostscript 9.10 (2013-08-30)
   Processing pages 1 through 5.

   Page 1
    0.00000  0.00000  0.00000  0.02231 CMYK OK
   Page 2
    0.02360  0.02360  0.02360  0.02360 CMYK OK
   Page 3
    0.02525  0.02525  0.02525  0.00000 CMYK OK
   Page 4
    0.00000  0.00000  0.00000  0.01983 CMYK OK
   Page 5
    0.13274  0.13274  0.13274  0.03355 CMYK OK

If you add the -q parameter, the result is this:

gs -q -o - -sDEVICE=inkcov cmyk.pdf

    0.00000  0.00000  0.00000  0.02231 CMYK OK
    0.02360  0.02360  0.02360  0.02360 CMYK OK
    0.02525  0.02525  0.02525  0.00000 CMYK OK
    0.00000  0.00000  0.00000  0.01983 CMYK OK
    0.13274  0.13274  0.13274  0.03355 CMYK OK

How to interprete these numbers?

  1. Each column represents a color, from left to right: Cyan (C), Magenta (M), Yellow (Y) and Black (K).
  2. A value of 0.00000 represents zero color used.
    A value of 1.00000 would mean 100% coverage with the respective color for the sheet. The value of 0.02360 for each single ink color on page 2 means: each color covers 2.36% of the full page (including Black).

You can see the values for page 1: the same value, 0.00000, for Cyan, Magenta and Yellow, but 0.02231 for Black. This means: page 1 uses black ink only, and 2.231 % of the pages area is covered by black ink.

Take page 2: here each of the 4 inks is given with a value of 0.02360. Each ink is covering 2.36 % of the full page.

Look also at the values for page 3: 0.02525 for C, M and Y and 0.00000 for Black. So this page does not use black ink at all, but uses the same mount of each colored ink to cover an identically sized area of 2.525 % of the full page.

Page 4: result is similar to page 1.

Page 5: See yourself...

Caveats:

  1. The inkcov device does always print CMYK values, never RGB values. The reason for this is that it converts all RGB color shades into CMYK before analysing the color coverage of pages. This of course introduces some inaccuracies (which you have to take into account before your rely on this tool).
  2. You need to use a version of Ghostscript 9.05 or later (if you're on MS Windows: v9.07 or later). Previous versions did not have the inkcov device.
  3. You certainly will come across PDF pages which do not appear to contain color but only gray shades when viewed in a PDF viewer or when printed on paper. This is because gray shades can be composed by using equal amounts of different colors.

Update

The following picture roughly reproduces the 5 PDF pages of above used cmyk.pdf. This should give you an approximate impression how they look like in a PDF viewer. It should make it easier to comprehend how the different values for the ink coverage quoted above do add up:

Image representing the 5 pages of <code>cmyk.pdf</code>

Here is the Ghostscript command that I originally used to create the above used cmyk.pdf:

gs                  \
  -o cmyk.pdf       \
  -sDEVICE=pdfwrite \
  -g5950x2105       \
  -c "/F1 {100 100 moveto /Helvetica findfont 42 scalefont setfont} def" \
  -c "F1                        (100% 'pure' black)    show showpage"    \
  -c "F1 .5 .5 .5   setrgbcolor  (50% 'rich' rgbgray)  show showpage"    \
  -c "F1 .5 .5 .5 0 setcmykcolor (50% 'rich' cmykgray) show showpage"    \
  -c "F1 .5         setgray      (50% 'pure' gray)     show showpage"    \
  -c "   1 0 0 0 setcmykcolor 100 130 64 64 rectfill"                    \
  -c "   0 1 0 0 setcmykcolor 200 130 64 64 rectfill"                    \
  -c "   0 0 1 0 setcmykcolor 300 130 64 64 rectfill"                    \
  -c "   0 0 0 1 setcmykcolor 400 130 64 64 rectfill"                    \
  -c "   0 1 1 0 setcmykcolor 100  30 64 64 rectfill"                    \
  -c "   1 0 1 0 setcmykcolor 200  30 64 64 rectfill"                    \
  -c "   1 1 0 0 setcmykcolor 300  30 64 64 rectfill"                    \
  -c "   1 1 1 0 setcmykcolor 400  30 64 64 rectfill        showpage"
like image 155
Kurt Pfeifle Avatar answered Nov 15 '22 04:11

Kurt Pfeifle


The traditional way of doing this would be to use a preflight tool such as the tools from callas software (Caution: I'm associated with this company). But if this aspect of the PDF is the only aspect you want to check, that's probably going to be overkill.

I would think that the second possible approach would be to use a tool that can convert a PDF to images and then analyse the images (convert to a CMYK image - then see if there is anything on the C, M or Y channels in that generated image).

like image 41
David van Driessche Avatar answered Nov 15 '22 04:11

David van Driessche


Amyn,

This is Mohammad from LEADTOOLS support. I noticed that you posted a similar question on our LEADTOOLS support forums. I have already posted a reply there and here is a slightly modified copy of that reply:

/******************************************/

If the PDF page contains only black text on white background, loading it using the default settings will produce gray shades around the text edges to give them better smooth display as shown in attached image.

If you want such black text to be rasterized as pure black without gray shades, change the settings before loading using LEADTOOLS v18 as follows:

  1. Set the UsePdfEngine property of the loading PDF options to true like this:

    RasterCodecs.Options.Pdf.Load.UsePdfEngine = true;

  2. Set the TextAlpha property of the loading PDF options to 1 like this:

    RasterCodecs.Options.Pdf.Load.TextAlpha = 1;

  3. Load the PDF file using default bits per pixel (24-bits):

    RasterCodecs.Load("BlackTextWhiteBackground.pdf");

  4. Count the unique colors in the file using the ColorCountCommand Class function. If the number of colors is more than two, the image will not be black and white. This could happen if it contains non-black text or other color images or graphics objects:

    ColorCountCommand MyCommand = new ColorCountCommand(); MyCommand.Run(_viewer.Image);

Make sure that the "Leadtools.PdfEngine.dll" is placed in the output folder of your project (next to the EXE).

/******************************************/ Black text rendered with gray shades

Edit to answer comment about detecting gray page:

It is possible to tell whether the page is color or purely shades of gray. Add the following code after loading as 24-bits and counting the colors:

if (MyCommand.ColorCount > 2 && MyCommand.ColorCount <= 256) //could be gray
{
   ColorResolutionCommand colorRes = new ColorResolutionCommand(ColorResolutionCommandMode.InPlace, 8, 
      RasterByteOrder.Bgr,RasterDitheringMethod.None, ColorResolutionCommandPaletteFlags.Optimized, null);
   colorRes.Run(_viewer.Image);
   if(_viewer.Image.GrayscaleMode == RasterGrayscaleMode.None)
      MessageBox.Show("image is NOT grayscale");
   else
      MessageBox.Show("image is grayscale, its mode is: " + _viewer.Image.GrayscaleMode);
}
like image 26
LEADTOOLS Support Avatar answered Nov 15 '22 04:11

LEADTOOLS Support