Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to alter the text shown by pdf.js?

I'm not trying to modify the PDF, I'm just trying to change the displayed text

pdf.js outputs text it reads in a bunch of divs .textLayer > div, it also draws a canvas

I read here that viewing and editing pdf in the browser is almost impossible, but...

Since pdf.js does have an API, my idea is to "hook" into pdf.js and change the displayed text (that's more than enough in my case)

The closest I could find is this function named getTextContent(), but there are no callback registered AFAICS.

Is this even possible (without messing with pdf.js itself)? If so, how?


EDIT (3)

This code will print the PDF text into console, but how to proceed from there is a mystery to me.

'use strict';

// In production, the bundled pdf.js shall be used instead of SystemJS.
Promise.all([System.import('pdfjs/display/api'),
System.import('pdfjs/display/global'),
System.import('pdfjs/display/network'),
System.resolve('pdfjs/worker_loader')])
    .then(function (modules)
    {
        var api = modules[0], global = modules[1];

        // In production, change this to point to the built `pdf.worker.js` file.
        global.PDFJS.workerSrc = modules[3];

        // Fetch the PDF document from the URL using promises
        let loadingTask        = api.getDocument('cv.pdf');

        loadingTask.onProgress = function (progressData) {
            document.getElementById('progress').innerText = (progressData.loaded / progressData.total);
        };

        loadingTask.then(function (pdf)
        {
            // Fetch the page.
            pdf.getPage(1).then(function (page)
            {
                var scale     = 1.5;
                var viewport  = page.getViewport(scale);

                // Prepare canvas using PDF page dimensions.
                var canvas    = document.getElementById('pdf-canvas');
                var context   = canvas.getContext('2d');
                canvas.height = viewport.height;
                canvas.width  = viewport.width;

                // (Debug) Get PDF text content
                page.getTextContent().then(function (textContent)
                {
                    console.log(textContent);
                });

                // Render PDF page into canvas context.
                var renderContext =
                {
                    canvasContext: context,
                    viewport     : viewport
                };
                page.render(renderContext);
            });
        });
    });

EDIT (2)

The code example that I'm trying to mess with is viewer.js. Granted it's not the easiest example, but it's the simplest one that I could find that implements text in DOM


EDIT (1)

I did try to manipulate the DOM (specifically the .textLayer > div I mentioned earlier), but pdf.js uses both DIVs and canvas to do its magic, it's not just text, so the result was text div shown on top of the canvas (or the other way around), see:

http://imgur.com/a/2hoZZ

like image 357
TheDude Avatar asked Aug 15 '17 05:08

TheDude


People also ask

How do I edit a PDF in JavaScript?

To open the Document Actions dialog, choose Tools > JavaScript > Document Actions. Select an action and then click Edit to add the script to the action.


1 Answers

The reason for the first edit effect is because pdfjs uses hidden div elements to enable text selection. In order to prevent pdfjs from rendering text on the canvas without modifying the script you can add the following code:

CanvasRenderingContext2D.prototype.strokeText = function () { };
CanvasRenderingContext2D.prototype.fillText = function () { };

Also if you want to avoid the text manipulation in the html elements you can render them yourself with the same method you print to console. Here is a working jsfiddle that changes Hello, world! to Burp! :)

The jsfiddle was created from the following resources:

  • Text rendering - http://bl.ocks.org/hubgit/600ec0c224481e910d2a0f883a7b98e3
  • SO question for hiding text - In PDF.js, how do you hide the canvas and display the underlying text at full opacity?
like image 151
vl4d1m1r4 Avatar answered Oct 05 '22 22:10

vl4d1m1r4