Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove a black background from PDF text before printing

I have a PDF with a black background and white/yellow text.

How can I remove the black background before printing and invert the color of the text?

like image 419
wgpubs Avatar asked Sep 28 '09 18:09

wgpubs


3 Answers

This is likely to be non-trivial to solve in general, but if you have a predictable collections of PDFs (say, all from the same source) then you may be able to hack together a quick solution like so:

  • install CAM::PDF from CPAN
  • run "getpdfpage.pl my.pdf 1 > page1.txt" to get the graphic codes for page 1
  • search for " rg" to find where the RGB text color is changed (or "RG" for background, or maybe "g" or "G" for grayscale or "k" or "K" for CMYK colors "sc" or "SC" for special colorspaces)
  • edit page1.txt to set the colors you like
  • run "setpdfpage.pl my.pdf 1 page1.txt out.pdf"

All of this can be done programmatically instead of via command line tools too. getpdfpage.pl and setpdfpage.pl are simple little wrappers around the CAM::PDF API.

A general solution would be to use getPageContentTree() to parse the PDF page syntax and search for the color changing operators and alter them. But if your PDF uses a custom color space ("sc") this can be tricky. And searching for the operator that does the full-page black fill could be hard too, depending on the geometry.

If you provide an URL for a sample PDF, I could provide some more specific advice.

UPDATE: on a whim, I wrote a rudimentary color changer script that may work for some PDFs. To use it, run like this example which turns any red element green instead:

perl recolor.pl input.pdf '1 0 0 rg' '0 1 0 rg' out.pdf

This requires you to know the PDF syntax of the color directives you're trying to change, so it may still require something like the getpdfpage.pl steps recommended above.

And the source code:

#!/usr/bin/perl -w                      

use strict;
use CAM::PDF;
use CAM::PDF::Content;

my %COLOROPS = map {$_ => 1} qw(rg RG g G k K sc SC);

my $pdf = CAM::PDF->new(shift) || die $CAM::PDF::errstr;
my @oldcolors;
my @newcolors;
while (@ARGV >= 2) {
   push @oldcolors, parseColor(shift);
   push @newcolors, parseColor(shift);
}
my $out = shift || '-';

for my $p (1 .. $pdf->numPages) {
   my $page = $pdf->getPageContentTree($p);
   traverse($page->{blocks});
   $pdf->setPageContent($p, $page->toString());
}
$pdf->cleanoutput($out);

sub parseColor {
   my ($in) = @_;
   my $ops = CAM::PDF::Content->new($in);
   die 'Invalid color syntax in ' . $in if !$ops->validate();
   my @blocks = @{$ops->{blocks}};
   die 'Expected one color operator in ' . $in if @blocks != 1;
   my $color = $blocks[0];
   die 'Not a color operator in ' . $in if !exists $COLOROPS{$color->{name}};
   return $color;
}

sub traverse {
   my ($blocks) = @_;
   for my $op (@{$blocks}) {
      if ($op->{type} eq 'block') {
         traverse($op->{value});
      } elsif (exists $COLOROPS{$op->{name}}) {
       COLOR:
         for (my $i=0; $i < @oldcolors; ++$i) {
            my $old = $oldcolors[$i];
            if ($old->{name} eq $op->{name} && @{$old->{args}} == @{$op->{args}}) {
               for (my $v=0; $v < @{$op->{args}}; ++$v) {
                  next COLOR if $old->{args}->[$v]->{value} != $op->{args}->[$v]->{value};
               }
               # match! so we will replace                                                                                  
               $op->{name} = $newcolors[$i]->{name};
               @{$op->{args}} = @{$newcolors[$i]->{args}};
               last COLOR;
            }
         }
      }
   }
}
like image 62
Chris Dolan Avatar answered Sep 30 '22 12:09

Chris Dolan


I like Chris' solution, as it seems to be the best way to go. I haven't personally tried that, but one thing that did work for me was taking a screenshot of the pdf page in question, pasting it in an image viewer (I used Irfanview), and manipulating the colors until I got the white background with black text. The original pdf was a red background with black text.

Used irfanview to convert the image to 2 colors (black and white). For you, you might have to generate a negative of the image first, then convert to 2 colors (or maybe just the negative image conversion might be enough). The end result for me resulted in some minor pixellation in the text, but for my purposes (a simple list from kids' school), it worked fine.

like image 45
Dox Avatar answered Sep 30 '22 10:09

Dox


On OS X, if you've got GraphicConverter (free full trial available last I checked), there's a great way to do this, and crop off a black border that might result from inversion too.

Under the File->Convert & Modify (or batch conversion in the options you first get), you can hit the "Edit Batches" button, and choose invert, greyscale, and contrast, adjust the contrast all the way up (when it gets greyscaled it's all the same), and choose crop too and choose the right border (for my situation it was 720x540), which you can first check by opening the file and selecting the part you want--the pixels selected show up in a little status box.

I wasn't able to convert from pdf to pdf directly--it only changed the first page of the pdf, but outputting as pngs did the trick nicely, which allowed me to print nice black text on white background images.

Then you've got it all set up for the next pdf with this bad setup.

like image 27
Ben Avatar answered Sep 30 '22 12:09

Ben