Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Layers/Background from PDF in PHP/Bash/C#

I have some PDF files that I need to modify using a PHP script. I'm also able to exec() so I can use pretty much anything that runs on CentOS.

The PDF files when opened through Adobe Acrobat Pro X, show 2 layers in the "layers" panel:

  1. Background
  2. Color

When I disable both of these layers I end up with a black & white text & images (the text is not vector tho, it's a scanned document).

I want to disable these layers and any other similar layer found in the PDFs using PHP and/or C# or any command-line tool.

Other useful information:

When I run pdfimages (provided with XPDF) on my PDFs, it extracts exactly what I actually need removed from each page...

Additional Information Update: I modified the PDFSharp example here: http://www.pdfsharp.net/wiki/ExportImages-sample.ashx :

Modified:
Line 28: ExportImage(xObject, ref imageCount);

To:
PdfObject obj = xObject.Elements.GetObject("/OC");
Console.WriteLine(obj);

I got the following output in the console for each image:
<< /Name Background /Type /OCG >>
<< /OCGs [ 2234 0 R ] /P /AllOff /Type /OCMD >>
<< /Name Text Color /Type /OCG >>

Which is actually the layer information, and the PDFSharp Documentation for the /OC key:

Before the image is processed, its visibility is determined based on this entry. If it is determined to be invisible, the entire image is skipped, as if there were no Do operator to invoke it.

So now, how do I modify the /OC value to something that will make these layers invisible?

like image 916
Tom Avatar asked May 22 '11 17:05

Tom


People also ask

How do I remove GREY shading from a PDF?

Under Categories, select Accessibility. Check Replace Document Colors and Custom Color. Set Page Background to white. Click OK.

What is background removal in PDF?

Every business uses background removal techniques to improve the appearance of its document design, including PDF files. It is an editing tool that removes the background from PDF and focuses solely on the subject.


1 Answers

After long hours of experimenting, I found the way! I'm posting the code so someone may find it helpful in the future:

using System;
using System.IO;
using System.Collections.Generic;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace LayerHide {

    class MainClass
    {
        public static void Main (string[] args)
        {

            PdfReader reader = new PdfReader("test.pdf");
            PdfStamper stamp = new PdfStamper(reader, new FileStream("test2.pdf", FileMode.Create));
            Dictionary<string, PdfLayer> layers = stamp.GetPdfLayers();

            foreach(KeyValuePair<string, PdfLayer> entry in layers )
            {
                PdfLayer layer = (PdfLayer)entry.Value;
                layer.On = false;
            }

            stamp.Close();
        }
    }
}
like image 165
Tom Avatar answered Oct 04 '22 01:10

Tom