Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading pdf from url with node.js using PDF.js

I'm trying to extract the text of a pdf from the pdf's url. Following the example on the pdf.js website, i understand how to render a pdf on client-side, but I'm running into issues when I do this server-side.

I downloaded the package using npm i pdfjs-dist

I tried the code below as a simple example to load the pdf:

var url = 'https://raw.githubusercontent.com/mozilla/pdf.js/ba2edeae/examples/learning/helloworld.pdf';
var pdfjsLib = require("pdfjs-dist")
var loadingTask = pdfjsLib.getDocument(url);

loadingTask.promise.then(function (pdf) {
    console.log(pdf);
}).catch(function (error){
    console.log(error)
})

But when I run this, I get the following error:

  message: 'The browser/environment lacks native support for critical functionality used by the PDF.js library (e.g. `ReadableStream` and/or `Promise.allSettled`); please use an ES5-compatible build instead.',
  name: 'UnknownErrorException',
  details: 'Error: The browser/environment lacks native support for critical functionality used by the PDF.js library (e.g. `ReadableStream` and/or `Promise.allSettled`); please use an ES5-compatible build instead.'

Any ideas on how to go about doing this? All I'm trying to do is extract the text of a pdf from it's URL. And I'm trying to do this server side using nodejs. Appreciate any input!

like image 774
Neeraj Kulkarni Avatar asked Oct 03 '20 21:10

Neeraj Kulkarni


1 Answers

I've also faced the same issue in latest version of pdfjs-dist (2.8.335) while using it in a node js project and as mentioned in other answers that we need to change path to fix this.

But in my case path - pdfjs-dist/es5/build/pdf didn't work.

In latest version it got changed to pdfjs-dist/legacy/build/pdf.js

like image 190
Abhay Sehgal Avatar answered Nov 09 '22 23:11

Abhay Sehgal