Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

My text is written in a bad direction when I use a template

I want to add a text on an existing PDF using Rails, so I did :

filename = "#{Rails.root}/app/assets/images/sample.pdf"
Prawn::Document.generate("#{Rails.root}/app/assets/images/full_template.pdf", :template => filename) do
  text "Test", :align => :center
end

And when I open full_template.pdf, I have my template PDF + my text "Test", but this text is written in a bad direction as if my text was written using a mirror.

You can find the two PDF documents here:

Original : http://www.sebfie.com/wp-content/uploads/sample.pdf

Generated : http://www.sebfie.com/wp-content/uploads/full_template.pdf

like image 834
Sebastien Avatar asked Aug 22 '12 15:08

Sebastien


2 Answers

Let's see... [switching into PDF debugging mode].

First, I unpack your full_template.pdf with the help of qpdf, a command-line utility "that does structural, content-preserving transformations on PDF files" (self-description):

qpdf --qdf full_template.pdf qdf---test.pdf

The result, qdf---test.pdf is now more easy to analyse in a normal text editor, because all streams are unpacked.

Searching for the string "est" finds us this line:

[(T) 120 (est)] TJ

Poking around a bit more (and looking at qpdf's very helpful comments sprinkled into its output!) we find this: the PDF object where your mirrored string "Test" appears in the original PDF is number 22. It is a completely separate object from the rest of the file's text, and it also is the only one that uses an un-embedded Helvetica font.

So let's extract that separately from the original file:

qpdf --show-object=22 --filtered-stream-data full_template.pdf 

 q
 /DeviceRGB cs
 0.000 0.000 0.000 scn
 /DeviceRGB CS
 0.000 0.000 0.000 SCN
 1 w
 0 J
 0 j
 [ ] 0 d

 BT
 286.55 797.384 Td
 /F3.0 12 Tf
 [<54> 120 <657374>] TJ
 ET

 Q

OK, here the piece [(T) 120 (est)] TJ appears as [<54> 120 <657374>] TJ. We verify this with the help of the ascii command, that prints us a nice ASCII <-> Hex table. That table confirms:

T  54
e  65
s  73
t  74

What do the other operators mean? We look them up in the official ISO 32000 PDF-1.7 spec, Annex A, "Operator Summary". Here we find the following bits of info:

 q   : gsave
 Q   : grestore
 cs  : setcolorspace for nonstroking ops
 CS  : setcolorspace for stroking ops
 scn : setcolor for nonstroking ops
 SCN : setcolor for stroking ops
 w   : setlinewidth
 j   : setlinejoin
 J   : setlinecap
 d   : setdash
 BT  : begin text object
 Td  : move text position
 Tf  : set text font and size
 TJ  : show text allowing individual glyph positioning
 Tj  : show text
 ET  : end text object

Nothing suspicious so far...

However, looking at the other object where the original page content appears in, object number 5, we discover a difference. For example:

1 0 0 -1 -17.2308 -13.485 Tm
<0013001c001200130018001200140015> Tj

Here, before each single action of a Tj (show text) the Tm operator (What is this?!?) is in play. Let's also look up Tm in the PDF spec:

 Tm  : set text matrix and text line matrix

What is strange however, is that this matrix uses 1 0 0 -1 (instead of the more common 1 0 0 1). This leads to the up-side down mirroring of the text.

Wait a minute!?!

The original text content is stroked with a mirroring text matrix, but still appears normal?? But your added text doesn't use any text matrix of its own, but appears mirrored? What is going on?!

I'm not going to trace it down for more now. My assumption is however, that somewhere in the guts of the original PDF, the authoring software defined an 'extended graphics state' which causes all stroking operations to be mirrored by default.

It seems you've done nothing wrong, Sebastien -- you've just been unlucky with your choice of a test object, and got blessed with a rather weird one. Try it continue your 'Prawn' experiments with some other PDFs first...

One can "fix" your full_template.pdf by replacing this line in qdf---test.pdf:

286.55 797.384 Td

by this one:

1 0 0 -1 286.55 797.384 Tm

and then run a last qdf command to fix the (now corrupted by our editing) PDF cross-reference table and stream lenghts:

qpdf qdf---test.pdf full_template---fixed.pdf

The console output will show you want it does:

  WARNING: qdf---test.pdf: file is damaged
  WARNING: qdf---test.pdf (file position 151169): xref not found
  WARNING: qdf---test.pdf: Attempting to reconstruct cross-reference table
  WARNING: qdf---test.pdf (object 8 0, file position 9072): attempting to recover stream length
  qpdf: operation succeeded with warnings; resulting file may have some problems

The "fixed" PDF will show the text un-mirrored.

like image 101
Kurt Pfeifle Avatar answered Nov 20 '22 11:11

Kurt Pfeifle


My Pull Request has been merged, so the issue is now fixed in the prawn-templates gem. The fix was to reset the graphics state before adding any content to the PDF.

This was happening because Google Chrome and Google Docs export PDFs with a transformation matrix that vertically flips all of the content. By default, PDFs are rendered from the bottom left corner. Google's custom transformation means that they can calculate coordinates from the top-left corner of the PDF, which does make more sense to me.

P.S. Thanks very much to @KurtPfeifle for the very helpful answer! I wouldn't have got this far without that information.

like image 26
ndbroadbent Avatar answered Nov 20 '22 11:11

ndbroadbent