Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get metadata from pdf document using pdf.js

Is there any way to get metadata from pdf document like author or title using pdf.js?

In this example : http://mozilla.github.io/pdf.js/web/viewer.html?file=compressed.tracemonkey-pldi-09.pdf

<div class="row">
<span data-l10n-id="document_properties_author">
    Autor:
</span>
<p id="authorField">
    -
</p>

And the authorField is empty. Is there any way to get this info?

like image 854
Michał Sekunda Avatar asked Mar 30 '14 11:03

Michał Sekunda


2 Answers

Using just the PDF.js library without a thirdparty viewer, you can get metadata like so, utilizing promises.

PDFJS.getDocument(url).then(function (pdfDoc_) {
        pdfDoc = pdfDoc_;   
        pdfDoc.getMetadata().then(function(stuff) {
            console.log(stuff); // Metadata object here
        }).catch(function(err) {
           console.log('Error getting meta data');
           console.log(err);
        });

       // Render the first page or whatever here
       // More code . . . 
    }).catch(function(err) {
        console.log('Error getting PDF from ' + url);
        console.log(err);
    });

I found this out after dumping the pdfDoc object to the console and looking through its functions and properties. I found the method in its prototype and decided to just give it a shot. Lo and behold it worked!

like image 200
The Unknown Dev Avatar answered Sep 28 '22 06:09

The Unknown Dev


You can get document basic metadata info from PDFViewerApplication.documentInfo object. For eg: to get Author use PDFViewerApplication.documentInfo.Author

like image 28
user3002090 Avatar answered Sep 28 '22 05:09

user3002090