I'm trying to extract the text of a pdf from the pdf's url. Following the example on the pdf.js website, i understand how to render a pdf on client-side, but I'm running into issues when I do this server-side.
I downloaded the package using npm i pdfjs-dist
I tried the code below as a simple example to load the pdf:
var url = 'https://raw.githubusercontent.com/mozilla/pdf.js/ba2edeae/examples/learning/helloworld.pdf';
var pdfjsLib = require("pdfjs-dist")
var loadingTask = pdfjsLib.getDocument(url);
loadingTask.promise.then(function (pdf) {
console.log(pdf);
}).catch(function (error){
console.log(error)
})
But when I run this, I get the following error:
message: 'The browser/environment lacks native support for critical functionality used by the PDF.js library (e.g. `ReadableStream` and/or `Promise.allSettled`); please use an ES5-compatible build instead.',
name: 'UnknownErrorException',
details: 'Error: The browser/environment lacks native support for critical functionality used by the PDF.js library (e.g. `ReadableStream` and/or `Promise.allSettled`); please use an ES5-compatible build instead.'
Any ideas on how to go about doing this? All I'm trying to do is extract the text of a pdf from it's URL. And I'm trying to do this server side using nodejs. Appreciate any input!
I've also faced the same issue in latest version of pdfjs-dist (2.8.335) while using it in a node js project and as mentioned in other answers that we need to change path to fix this.
But in my case path - pdfjs-dist/es5/build/pdf
didn't work.
In latest version it got changed to pdfjs-dist/legacy/build/pdf.js
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With