Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get document properties from PDF in iTextSharp

Tags:

c#

pdf

itext

I'm trying to get some information out of a PDF file. I've tried using PdfSharp, and it has properties for the information I need, but it cannot open iref streams, so i've had to abandon it.

Instead i'm trying iTextSharp. so far i've managed to get some basic information out, like the title, aurhor and subject, from the Info array.

However, i'm now after a bit more information, but cannot find where it is exposed (if it is exposed) in iTextSharp.... The information I am after is highlighted in the image below:

Info I Need

I cannot figure out where this information is stored. Any and all help will be much appreciated.

like image 907
Tom Beech Avatar asked Feb 14 '23 02:02

Tom Beech


2 Answers

For documents encrypted using standard password encryption you can retrieve the permissions after opening the file in a PdfReader pdfReader using

  • getPermissions() in case of iText/Java

      int permissions = pdfReader.getPermissions()
    
  • Permissions in case of iTextSharp/.Net

      int permissions = pdfReader.Permissions
    

The int value returned is the P value of the encryption dictionary which contains

A set of flags specifying which operations shall be permitted when the document is opened with user access (see Table 22).

[...]

The value of the P entry shall be interpreted as an unsigned 32-bit quantity containing a set of flags specifying which access permissions shall be granted when the document is opened with user access. Table 22 shows the meanings of these flags. Bit positions within the flag word shall be numbered from 1 (low-order) to 32 (high order). A 1 bit in any position shall enable the corresponding access permission.

[...]

Bit position Meaning

3 (Security handlers of revision 2) Print the document. (Security handlers of revision 3 or greater) Print the document (possibly not at the highest quality level, depending on whether bit 12 is also set).

4 Modify the contents of the document by operations other than those controlled by bits 6, 9, and 11.

5 (Security handlers of revision 2) Copy or otherwise extract text and graphics from the document, including extracting text and graphics (in support of accessibility to users with disabilities or for other purposes). (Security handlers of revision 3 or greater) Copy or otherwise extract text and graphics from the document by operations other than that controlled by bit 10.

6 Add or modify text annotations, fill in interactive form fields, and, if bit 4 is also set, create or modify interactive form fields (including signature fields).

9 (Security handlers of revision 3 or greater) Fill in existing interactive form fields (including signature fields), even if bit 6 is clear.

10 (Security handlers of revision 3 or greater) Extract text and graphics (in support of accessibility to users with disabilities or for other purposes).

11 (Security handlers of revision 3 or greater) Assemble the document (insert, rotate, or delete pages and create bookmarks or thumbnail images), even if bit 4 is clear.

12 (Security handlers of revision 3 or greater) Print the document to a representation from which a faithful digital copy of the PDF content could be generated. When this bit is clear (and bit 3 is set), printing is limited to a low-level representation of the appearance, possibly of degraded quality.

(Section 7.6.3.2 "Standard Encryption Dictionary" in the PDF specification ISO 32000-1)

You can use the PdfWriter.ALLOW_* constants in this context.

Concerning the dialog screenshot you made, though, be aware that the operations effectively allowed do not only depend on the PDF document but also on the PDF viewer! Otherwise you might be caught in the same trap as the OP of this question.

like image 138
mkl Avatar answered Feb 23 '23 18:02

mkl


Thanks to mkl for your answer, it was part of the story, but here is the answer which you helped me find:

using (var pdf = new PdfReader(File))
{
   Console.WriteLine(PdfEncryptor.IsModifyAnnotationsAllowed(pdf.Permissions));
}

The PdfEncryptor is what was missing, it converts the P value into a simple bool for yes or no. Other methods on there are:

  • IsAssemblyAllowed
  • IsCopyAllowed
  • IsDegradedPrintingAllowed
  • IsFillInAllowed
  • IsModifyAnnotationsAllowed
  • IsModifyContentsAllowed
  • IsPrintingAllowed
  • IsScreenReadersAllowed

As for the security method part, this is what i went with:

using (var pdf = new PdfReader(File))
{
   Console.WriteLine(!pdf.IsOpenedWithFullPermissions == Expected);
}
like image 40
Tom Beech Avatar answered Feb 23 '23 17:02

Tom Beech