Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pdf2json gives me a blank output txt file?

I am following their "Code Example" guide on their github. https://github.com/modesty/pdf2json#code-example

In the example that says "Parse a PDF then write a .txt file (which only contains textual content of the PDF)", I copied and pasted the exact implementation into my a local JavaScript file and called it but the output text file was completely blank.

'use strict';

let fs = require('fs');
let PDFParser = require("pdf2json");

let pdfParser = new PDFParser();

pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError) );
pdfParser.on("pdfParser_dataReady", pdfData => {
    fs.writeFile("./node_modules/pdf2json/test/F1040EZ.content.txt", pdfParser.getRawTextContent());
});

pdfParser.loadPDF("./node_modules/pdf2json/test/pdf/fd/form/F1040EZ.pdf");

Is it something that I am doing wrong? Or does this not work on their part? Also are there any alternatives to pdf to text converters for Nodejs without additional binaries installed?

like image 240
ThePumpkinMaster Avatar asked Jun 10 '16 21:06

ThePumpkinMaster


1 Answers

The frontpage documentation is a bit wrong! In order to make this work simply set to PDFParser parameters null and 1

This one works:

var fs = require("fs");

// https://github.com/modesty/pdf2json
var PDFParser = require("./node_modules/pdf2json/PDFParser");
var pdfParser = new PDFParser(this,1);

pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError));
pdfParser.on("pdfParser_dataReady", pdfData => {
    console.log(pdfParser)
    fs.writeFile("./content.txt", pdfParser.getRawTextContent());
});

HTH -XDVarpunen

Link to issue in pdf2json: https://github.com/modesty/pdf2json/issues/76

like image 158
xdvarpunen Avatar answered Nov 02 '22 05:11

xdvarpunen