Hello I'm trying to get pdfreader module to work in node.js to convert PDFs into text format.
When I run it simply using command line like node sandbox/pdf.js (the contents below) it works fine.
var pdfreader = require('pdfreader');
var rows = {}; // indexed by y-position
function printRows() {
Object.keys(rows) // => array of y-positions (type: float)
.sort((y1, y2) => parseFloat(y1) - parseFloat(y2)) // sort float positions
.forEach((y) => console.log((rows[y] || []).join('')));
}
new pdfreader.PdfReader().parseFileItems('lib/sandbox/book-eric.pdf', function(err, item){
if (!item || item.page) {
// end of file, or page
printRows();
console.log('PAGE:', item.page);
rows = {}; // clear rows for next page
}
else if (item.text) {
// accumulate text items into rows object, per line
(rows[item.y] = rows[item.y] || []).push(item.text);
}
});
`
When I launch it from my express-based node.js app, however, using node app I get the following error as soon as I include the module
var pdfreader = require('pdfreader');
The error is:
TypeError: Cannot read property 'userAgent' of undefined
at detectSyncFontLoadingSupport (eval at <anonymous> (/Users/deemeetree/Documents/Root/InfraNodus/node_modules/pdf2json/lib/pdf.js:60:1), <anonymous>:42060:38)
at eval (eval at <anonymous> (/Users/deemeetree/Documents/Root/InfraNodus/node_modules/pdf2json/lib/pdf.js:60:1), <anonymous>:42066:5)
It looks like the module that it's relying on, pdf2json is trying to get eval of some files and it's not working:
eval(_fileContent); which is the content of pdf2json files.
Anyone knows what I could do to make it work?
The solution was to add this before including the pdfreader module:
global.navigator = {
userAgent: 'node',
}
window.navigator = {
userAgent: 'node',
}
I hope it helps others as I spent 2 hours trying to troubleshoot it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With