How to cut-paste from PDF with non-ASCII encoding?

Tags:

I have some PDFs and I am trying to cut and paste text they contain from Acrobat Reader into an HTML form. It seems that some of these files use (I suspect) unicode for text encoding, so when I try to paste into the HTML form (on firefox) I get the little boxes with hex chars in them rather than readable text. The problem is not that the PDF has not been OCRed -- when I try to do that in Acrobat Pro it says it can't because the file already contains renderable text. Is there any way to deal with this? For example could I add some sort of javascript to the form that would do conversion?

619

asked Feb 04 '12 18:02

Steve

1 Answers

Are you able to paste text copied from the file into other programs like Notepad or Word or any other?

Some PDF files are produced without special information that is crucial for successful extraction of text from them. Even by the Adobe tools. Basically, such files do not contain glyph-to-character mapping information.

Such files will be displayed and printed just fine, but text from them can't be properly copied / extracted.

For example, Distiller produces such files when "Smallest File Size" preset is used.

179

answered Sep 21 '22 11:09

Bobrovsky

Related questions
                            
                                Parse Pdf File and write content in word file using java
                            
                                How to print PDF file in a Java application?
                            
                                When sending headers to download a PDF, Safari appends .html
                            
                                How do I validate that an NSData is a PDF?
                            
                                PDF form fill with PDFBox doesn't work
                            
                                iTextSharp Use Link Inside PdfPCell
                            
                                HTML2PDF Image Error Impossible to Load the Image
                            
                                Tracking Code Into a PDF or PostScript File
                            
                                Basic code to display a pdf in an existing JPanel?
                            
                                How to convert PDF to Word using Acrobat SDK? [closed]
                            
                                How to extract images from a PDF in their original format
                            
                                Create multi page PDF in objective-c
                            
                                Get pdf-attachments from Gmail as text
                            
                                struct.error: unpack requires a string argument of length 16
                            
                                How to create a bulleted list in ReportLab
                            
                                Why are ePub files so much smaller than mobi or PDF files for the same book
                            
                                PHP + PDF, how to save a downloaded PDF using curl?
                            
                                How can I stop Adobe Reader from asking "Save As" when saving a PDF form (i.e. allow just "Save")?
                            
                                barryvdh/laravel-dompdf page break content changes PDF
                            
                                Draw a rectangle in a PDF document using iText

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to cut-paste from PDF with non-ASCII encoding?

Tags:

pdf

unicode

acrobat

Steve

People also ask

1 Answers

Bobrovsky

Recent Activity

Donate For Us