Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove links from a PDF using iText 7.1

We have a vendor that will not accept PDFs that contain links. We are trying to remove the links by removing all link annotations from each page of the PDF using iText 7.1 (Java). We have tried multiple techniques based on research. Here are three examples of attempts to detect and remove the links. None of these result in the destination PDF (test-no-links.pdf) having the links removed. Any insight would be greatly appreciated.

Example 1: Remove based on class type of annotation

  String src  = "test-with-links.pdf";
  String dest = "test-no-links.pdf";

  PdfReader   reader  = new PdfReader(src);
  PdfWriter   writer  = new PdfWriter(dest);
  PdfDocument pdfDoc  = new PdfDocument(reader,writer);

  for( int page = 1; page <= pdfDoc.getNumberOfPages(); ++page ) {
    PdfPage               pdfPage     = pdfDoc.getPage(page);
    List<PdfAnnotation>   annots      = pdfPage.getAnnotations();

    if ((annots == null) || (annots.size() == 0)) {
      System.out.println("no annotations on page " + page);
    }
    else {
      for( PdfAnnotation annot : annots ) {
        if( annot instanceof PdfLinkAnnotation ) {
          pdfPage.removeAnnotation(annot);
        }
      }
    }
  }
  pdfDoc.close();

Example 2: Remove based on annotation subtype value

  String src  = "test-with-links.pdf";
  String dest = "test-no-links.pdf";

  PdfReader   reader  = new PdfReader(src);
  PdfWriter   writer  = new PdfWriter(dest);
  PdfDocument pdfDoc  = new PdfDocument(reader,writer);

  for( int page = 1; page <= pdfDoc.getNumberOfPages(); ++page ) {
    PdfPage               pdfPage     = pdfDoc.getPage(page);
    List<PdfAnnotation>   annots      = pdfPage.getAnnotations();

    if ((annots == null) || (annots.size() == 0)) {
      System.out.println("no annotations on page " + page);
    }
    else {
      for( PdfAnnotation annot : annots ) {
        // if this annotation has a link, delete it
        if ( annot.getSubtype().equals(PdfName.Link) ) {
          PdfDictionary annotAction = ((PdfLinkAnnotation)annot).getAction();

          if( annotAction.get(PdfName.S).equals(PdfName.URI) ||
              annotAction.get(PdfName.S).equals(PdfName.GoToR) ) {
            PdfString uri = annotAction.getAsString(PdfName.URI);
            System.out.println("Removing " + uri.toString());
            pdfPage.removeAnnotation(annot);
          }
        }
      }
    }
  }
  pdfDoc.close();

Example 3: Remove all annotations (ignore annotation type)

  String src  = "test-with-links.pdf";
  String dest = "test-no-links.pdf";

  PdfReader   reader  = new PdfReader(src);
  PdfWriter   writer  = new PdfWriter(dest);
  PdfDocument pdfDoc  = new PdfDocument(reader,writer);

  for( int page = 1; page <= pdfDoc.getNumberOfPages(); ++page ) {
    PdfPage               pdfPage     = pdfDoc.getPage(page);

    // remove all annotations from the page regardless of type
    pdfPage.getPdfObject().remove(PdfName.Annots);
  }
  pdfDoc.close();
like image 214
Jack D. Avatar asked Mar 28 '26 13:03

Jack D.


1 Answers

Each of your tests generates a PDF without Link annotations.

Probably, though, your PDF viewer recognizes "www.qualpay.com" as (partial) URL and displays it as a link.

In detail

Your routines

All your tests successfully remove all Link annotations from your sample PDF, cf. these screen shots for the source and all three result files, in particular look for the page 1 Annots entry:

test-with-links.pdf

test-with-links.pdf

test-no-links.pdf

test-no-links.pdf

test-no-links-1.pdf

test-no-links-1.pdf

test-no-links-2.pdf

test-no-links-2.pdf

The viewer

Indeed, though, when viewing the PDF in Adobe Acrobat Reader (and also some other viewers, e.g. the built-in PDF viewers of Chrome and Edge), you'll see that "www.qualpay.com" is treated like a link.

The cause is that this is a feature of the PDF viewer! It scans the text of the PDF it displays for strings it recognizes as (a part of) some URL and displays them like links!

In Adobe Acrobat Reader you can disable this feature:

Preferences / General

If you disable "Create links from URLs", you'll suddenly find the URLs in your result files inactive while the URL in your source file (with the link annotation) is still active.

What to do

We have a vendor that will not accept PDFs that contain links.

First discuss with your vendor what exactly he means by "PDFs that contain links". Does he mean

  • PDFs with Link annotations or
  • PDFs with URLs that common PDF viewers present like Link annotations.

In the former case you're done, your code (either variant) removes the link annotations. You may have to demonstrate to the vendor how to disable the URL recognition in Adobe Acrobat Reader, though.

In the latter case you'll have to remove everything from the text content of your PDFs that common PDF viewers recognize as URLs. You may replace each URL by a bitmap image of the URL text, or the URL text drawn like a generic vector graphic (defining a path of lines and curves and filling that), or some similar surrogate.

like image 74
mkl Avatar answered Mar 30 '26 02:03

mkl



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!