We have a vendor that will not accept PDFs that contain links. We are trying to remove the links by removing all link annotations from each page of the PDF using iText 7.1 (Java). We have tried multiple techniques based on research. Here are three examples of attempts to detect and remove the links. None of these result in the destination PDF (test-no-links.pdf) having the links removed. Any insight would be greatly appreciated.
Example 1: Remove based on class type of annotation
String src = "test-with-links.pdf";
String dest = "test-no-links.pdf";
PdfReader reader = new PdfReader(src);
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(reader,writer);
for( int page = 1; page <= pdfDoc.getNumberOfPages(); ++page ) {
PdfPage pdfPage = pdfDoc.getPage(page);
List<PdfAnnotation> annots = pdfPage.getAnnotations();
if ((annots == null) || (annots.size() == 0)) {
System.out.println("no annotations on page " + page);
}
else {
for( PdfAnnotation annot : annots ) {
if( annot instanceof PdfLinkAnnotation ) {
pdfPage.removeAnnotation(annot);
}
}
}
}
pdfDoc.close();
Example 2: Remove based on annotation subtype value
String src = "test-with-links.pdf";
String dest = "test-no-links.pdf";
PdfReader reader = new PdfReader(src);
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(reader,writer);
for( int page = 1; page <= pdfDoc.getNumberOfPages(); ++page ) {
PdfPage pdfPage = pdfDoc.getPage(page);
List<PdfAnnotation> annots = pdfPage.getAnnotations();
if ((annots == null) || (annots.size() == 0)) {
System.out.println("no annotations on page " + page);
}
else {
for( PdfAnnotation annot : annots ) {
// if this annotation has a link, delete it
if ( annot.getSubtype().equals(PdfName.Link) ) {
PdfDictionary annotAction = ((PdfLinkAnnotation)annot).getAction();
if( annotAction.get(PdfName.S).equals(PdfName.URI) ||
annotAction.get(PdfName.S).equals(PdfName.GoToR) ) {
PdfString uri = annotAction.getAsString(PdfName.URI);
System.out.println("Removing " + uri.toString());
pdfPage.removeAnnotation(annot);
}
}
}
}
}
pdfDoc.close();
Example 3: Remove all annotations (ignore annotation type)
String src = "test-with-links.pdf";
String dest = "test-no-links.pdf";
PdfReader reader = new PdfReader(src);
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdfDoc = new PdfDocument(reader,writer);
for( int page = 1; page <= pdfDoc.getNumberOfPages(); ++page ) {
PdfPage pdfPage = pdfDoc.getPage(page);
// remove all annotations from the page regardless of type
pdfPage.getPdfObject().remove(PdfName.Annots);
}
pdfDoc.close();
Each of your tests generates a PDF without Link annotations.
Probably, though, your PDF viewer recognizes "www.qualpay.com" as (partial) URL and displays it as a link.
All your tests successfully remove all Link annotations from your sample PDF, cf. these screen shots for the source and all three result files, in particular look for the page 1 Annots entry:




Indeed, though, when viewing the PDF in Adobe Acrobat Reader (and also some other viewers, e.g. the built-in PDF viewers of Chrome and Edge), you'll see that "www.qualpay.com" is treated like a link.
The cause is that this is a feature of the PDF viewer! It scans the text of the PDF it displays for strings it recognizes as (a part of) some URL and displays them like links!
In Adobe Acrobat Reader you can disable this feature:

If you disable "Create links from URLs", you'll suddenly find the URLs in your result files inactive while the URL in your source file (with the link annotation) is still active.
We have a vendor that will not accept PDFs that contain links.
First discuss with your vendor what exactly he means by "PDFs that contain links". Does he mean
In the former case you're done, your code (either variant) removes the link annotations. You may have to demonstrate to the vendor how to disable the URL recognition in Adobe Acrobat Reader, though.
In the latter case you'll have to remove everything from the text content of your PDFs that common PDF viewers recognize as URLs. You may replace each URL by a bitmap image of the URL text, or the URL text drawn like a generic vector graphic (defining a path of lines and curves and filling that), or some similar surrogate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With