Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I know if PDF pages are color or black-and-white?

Given a set of PDF files among which some pages are color and the remaining are black & white, is there any program to find out among the given pages which are color and which are black & white? This would be useful, for instance, in printing out a thesis, and only spending extra to print the color pages. Bonus points for someone who takes into account double sided printing, and sends an appropriate black and white page to the color printer if it is are followed by a color page on the opposite side.

like image 588
Anil Avatar asked Mar 13 '09 04:03

Anil


People also ask

How do you tell if a PDF is black and white or color?

My recommendation is to render each page to an image and then check each pixel for RGB values not equal to each other. If R=G=B for each pixel then it's a grayscale image. This should allow you to determine if a PDF files color or black and white.

How can you tell if a PDF is color?

1 Correct answer That's why there's no single statement of "color mode" anywhere on the document properties. Click on the Show menu in that dialog (the screenshot shows All), and choose RGB. It will show RGB objects on the page.

How can I tell what color text is on a PDF?

Open your PDF using Adobe Reader. Select “Edit -> Preferences.” Choose “Accessibility.” Click on the “Document Text” box to open color options.


2 Answers

This is one of the most interesting questions I've seen! I agree with some of the other posts that rendering to a bitmap and then analyzing the bitmap will be the most reliable solution. For simple PDFs, here's a faster but less complete approach.

  1. Parse each PDF page
  2. Look for color directives (g, rg, k, sc, scn, etc)
  3. Look for embedded images, analyze for color

My solution below does #1 and half of #2. The other half of #2 would be to follow up with user-defined color, which involves looking up the /ColorSpace entries in the page and decoding them -- contact me offline if this is interesting to you, as it's very doable but not in 5 minutes.

First the main program:

use CAM::PDF;  my $infile = shift; my $pdf = CAM::PDF->new($infile); PAGE: for my $p (1 .. $pdf->numPages) {    my $tree = $pdf->getPageContentTree($p);    if (!$tree) {       print "Failed to parse page $p\n";       next PAGE;    }    my $colors = $tree->traverse('My::Renderer::FindColors')->{colors};    my $uncertain = 0;    for my $color (@{$colors}) {       my ($name, @rest) = @{$color};       if ($name eq 'g') {       } elsif ($name eq 'rgb') {          my ($r, $g, $b) = @rest;          if ($r != $g || $r != $b) {             print "Page $p is color\n";             next PAGE;          }       } elsif ($name eq 'cmyk') {          my ($c, $m, $y, $k) = @rest;          if ($c != 0 || $m != 0 || $y != 0) {             print "Page $p is color\n";             next PAGE;          }       } else {          $uncertain = $name;       }    }    if ($uncertain) {       print "Page $p has user-defined color ($uncertain), needs more investigation\n";    } else {       print "Page $p is grayscale\n";    } } 

And then here's the helper renderer that handles color directives on each page:

package My::Renderer::FindColors;  sub new {    my $pkg = shift;    return bless { colors => [] }, $pkg; } sub clone {    my $self = shift;    my $pkg = ref $self;    return bless { colors => $self->{colors}, cs => $self->{cs}, CS => $self->{CS} }, $pkg; } sub rg {    my ($self, $r, $g, $b) = @_;    push @{$self->{colors}}, ['rgb', $r, $g, $b]; } sub g {    my ($self, $gray) = @_;    push @{$self->{colors}}, ['rgb', $gray, $gray, $gray]; } sub k {    my ($self, $c, $m, $y, $k) = @_;    push @{$self->{colors}}, ['cmyk', $c, $m, $y, $k]; } sub cs {    my ($self, $name) = @_;    $self->{cs} = $name; } sub cs {    my ($self, $name) = @_;    $self->{CS} = $name; } sub _sc {    my ($self, $cs, @rest) = @_;    return if !$cs; # syntax error                                                                                                 if ($cs eq 'DeviceRGB') { $self->rg(@rest); }    elsif ($cs eq 'DeviceGray') { $self->g(@rest); }    elsif ($cs eq 'DeviceCMYK') { $self->k(@rest); }    else { push @{$self->{colors}}, [$cs, @rest]; } } sub sc {    my ($self, @rest) = @_;    $self->_sc($self->{cs}, @rest); } sub SC {    my ($self, @rest) = @_;    $self->_sc($self->{CS}, @rest); } sub scn { sc(@_); } sub SCN { SC(@_); } sub RG { rg(@_); } sub G { g(@_); } sub K { k(@_); } 
like image 82
Chris Dolan Avatar answered Oct 12 '22 10:10

Chris Dolan


Newer versions of Ghostscript (version 9.05 and later) include a "device" called inkcov. It calculates the ink coverage of each page (not for each image) in Cyan (C), Magenta (M), Yellow (Y) and Black (K) values, where 0.00000 means 0%, and 1.00000 means 100% (see Detecting all pages which contain color).

For example:

$ gs -q -o - -sDEVICE=inkcov file.pdf  0.11264  0.11605  0.11605  0.09364 CMYK OK 0.11260  0.11601  0.11601  0.09360 CMYK OK 

If the CMY values are not 0 then the page is color.

To just output the pages that contain colors use this handy oneliner:

$ gs -o - -sDEVICE=inkcov file.pdf |tail -n +4 |sed '/^Page*/N;s/\n//'|sed -E '/Page [0-9]+ 0.00000  0.00000  0.00000  / d' 
like image 23
Matteo Avatar answered Oct 12 '22 10:10

Matteo