My aim is to look for JavaScript of a given pattern in annotations in PDF. To do so I have come with the following code:
public static void main(String[] args) {
try {
// Reads and parses a PDF document
PdfReader reader = new PdfReader("Test.pdf");
// For each PDF page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
// Get a page a PDF page
PdfDictionary page = reader.getPageN(i);
// Get all the annotations of page i
PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);
// If page does not have annotations
if (page.getAsArray(PdfName.ANNOTS) == null) {
continue;
}
// For each annotation
for (int j = 0; j < annotsArray.size(); ++j) {
// For current annotation
PdfDictionary curAnnot = annotsArray.getAsDict(j);
// check if has JS as described below
PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A);
// test if it is a JavaScript action
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
// what here?
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
As far as I know comparing strings is done by StringCompare library. The thing is it compares two strings, but I am interested to know if JavaScript action in annotations starts with (or contains) this string: if (this.hostContainer) { try {
So, how do I check if JavaScript in annotations contains the above-mentioned string?
EDIT Sample page with JS is at: pdf with JS
JavaScript actions are defined as follows in ISO 32000-1:
12.6.4.16 JavaScript Actions
Upon invocation of a JavaScript action, a conforming processor shall execute a script that is written in the JavaScript programming language. Depending on the nature of the script, various interactive form fields in the document may update their values or change their visual appearances. Mozilla Development Center’s Client-Side JavaScript Reference and the Adobe JavaScript for Acrobat API Reference (see the Bibliography) give details on the contents and effects of JavaScript scripts. Table 217 shows the action dictionary entries specific to this type of action.
Table 217 – Additional entries specific to a JavaScript action
Key Type Value
S name (Required) The type of action that this dictionary describes; shall be JavaScript for a JavaScript action.
JS text string or text stream (Required) A text string or text stream containing the JavaScript script to be executed. PDFDocEncoding or Unicode encoding (the latter identified by the Unicode prefix U+FEFF) shall be used to encode the contents of the string or stream.
To support the use of parameterized function calls in JavaScript scripts, the JavaScript entry in a PDF document’s name dictionary (see 7.7.4, “Name Dictionary”) may contain a name tree that maps name strings to document-level JavaScript actions. When the document is opened, all of the actions in this name tree shall be executed, defining JavaScript functions for use by other scripts in the document.
Thus, if you are interested to know if JavaScript action in annotations starts with (or contains) this string: if (this.hostContainer) { try {
in the situation
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
// what here?
}
you likely will want to first check whether AnnotationAction.Get(PdfName.JS)
is a PdfString
or a PdfStream
, in either case retrieve the content as string, and check whether it or any of the functions it calls (the function might be defined in the JavaScript name tree) contains the string you search using usual string comparison methods.
I took your code, cleaned it a bit (in particular it was a mix of C# and Java) and added code as described above inspecting the immediate JavaScript code in the annotation action element:
System.out.println("file.pdf - Looking for special JavaScript actions.");
// Reads and parses a PDF document
PdfReader reader = new PdfReader(resource);
// For each PDF page
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
System.out.printf("\nPage %d\n", i);
// Get a page a PDF page
PdfDictionary page = reader.getPageN(i);
// Get all the annotations of page i
PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);
// If page does not have annotations
if (annotsArray == null)
{
System.out.printf("No annotations.\n", i);
continue;
}
// For each annotation
for (int j = 0; j < annotsArray.size(); ++j)
{
System.out.printf("Annotation %d - ", j);
// For current annotation
PdfDictionary curAnnot = annotsArray.getAsDict(j);
// check if has JS as described below
PdfDictionary annotationAction = curAnnot.getAsDict(PdfName.A);
if (annotationAction == null)
{
System.out.print("no action");
}
// test if it is a JavaScript action
else if (PdfName.JAVASCRIPT.equals(annotationAction.get(PdfName.S)))
{
PdfObject scriptObject = annotationAction.getDirectObject(PdfName.JS);
if (scriptObject == null)
{
System.out.print("missing JS entry");
continue;
}
final String script;
if (scriptObject.isString())
script = ((PdfString)scriptObject).toUnicodeString();
else if (scriptObject.isStream())
{
try ( ByteArrayOutputStream baos = new ByteArrayOutputStream() )
{
((PdfStream)scriptObject).writeContent(baos);
script = baos.toString("ISO-8859-1");
}
}
else
{
System.out.println("malformed JS entry");
continue;
}
if (script.contains("if (this.hostContainer) { try {"))
System.out.print("contains test string - ");
System.out.printf("\n---\n%s\n---", script);
// what here?
}
else
{
System.out.print("no JavaScript action");
}
System.out.println();
}
}
(Test SearchActionJavaScript, method testSearchJsActionInFile
)
using (PdfReader reader = new PdfReader(sourcePath))
{
Console.WriteLine("file.pdf - Looking for special JavaScript actions.");
// For each PDF page
for (int i = 1; i <= reader.NumberOfPages; i++)
{
Console.Write("\nPage {0}\n", i);
// Get a page a PDF page
PdfDictionary page = reader.GetPageN(i);
// Get all the annotations of page i
PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS);
// If page does not have annotations
if (annotsArray == null)
{
Console.WriteLine("No annotations.");
continue;
}
// For each annotation
for (int j = 0; j < annotsArray.Size; ++j)
{
Console.Write("Annotation {0} - ", j);
// For current annotation
PdfDictionary curAnnot = annotsArray.GetAsDict(j);
// check if has JS as described below
PdfDictionary annotationAction = curAnnot.GetAsDict(PdfName.A);
if (annotationAction == null)
{
Console.Write("no action");
}
// test if it is a JavaScript action
else if (PdfName.JAVASCRIPT.Equals(annotationAction.Get(PdfName.S)))
{
PdfObject scriptObject = annotationAction.GetDirectObject(PdfName.JS);
if (scriptObject == null)
{
Console.WriteLine("missing JS entry");
continue;
}
String script;
if (scriptObject.IsString())
script = ((PdfString)scriptObject).ToUnicodeString();
else if (scriptObject.IsStream())
{
using (MemoryStream stream = new MemoryStream())
{
((PdfStream)scriptObject).WriteContent(stream);
script = stream.ToString();
}
}
else
{
Console.WriteLine("malformed JS entry");
continue;
}
if (script.Contains("if (this.hostContainer) { try {"))
Console.Write("contains test string - ");
Console.Write("\n---\n{0}\n---", script);
// what here?
}
else
{
Console.Write("no JavaScript action");
}
Console.WriteLine();
}
}
}
When running either version against your sample file, one gets:
file.pdf - Looking for special JavaScript actions.
Page 1
Annotation 0 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_vii', 0]);
} catch(e) { console.println(e); }};
---
Annotation 1 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_ix', 0]);
} catch(e) { console.println(e); }};
---
Annotation 2 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_xi', 0]);
} catch(e) { console.println(e); }};
---
Annotation 3 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_3', 0]);
} catch(e) { console.println(e); }};
---
Annotation 4 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_15', 0]);
} catch(e) { console.println(e); }};
---
Annotation 5 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_37', 0]);
} catch(e) { console.println(e); }};
---
Annotation 6 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_57', 0]);
} catch(e) { console.println(e); }};
---
Annotation 7 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_81', 0]);
} catch(e) { console.println(e); }};
---
Annotation 8 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_111', 0]);
} catch(e) { console.println(e); }};
---
Annotation 9 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_136', 0]);
} catch(e) { console.println(e); }};
---
Annotation 10 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_160', 0]);
} catch(e) { console.println(e); }};
---
Annotation 11 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_197', 0]);
} catch(e) { console.println(e); }};
---
Annotation 12 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_179', 0]);
} catch(e) { console.println(e); }};
---
Annotation 13 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_201', 0]);
} catch(e) { console.println(e); }};
---
Annotation 14 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_223', 0]);
} catch(e) { console.println(e); }};
---
Page 2
No annotations.
Page 3
No annotations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With