Extract everything from PDF [closed]

1 Answers

Sounds like with a few days or weeks effort, you can adapt the open source tools to your needs. Fonts and everything can certainly be extracted, this is something that every PDF reader must do anyway to display them.

You should probably take an estimate of programmer costs ($/hr) and multiply it by the estimated time it would take to add the needed open source functionality (60-80 hours?). If this is greater or close to $5000 anyway, you might consider just buying the commercial software.

Otherwise, with the help of the (quite good) PDF reference, you should be well on your way.

One more thing, you might find Poppler to be of help. It is for rendering PDF, but that is very related to what you are trying to do.

151

answered Oct 06 '22 23:10

Adam Goode

Related questions
                            
                                Read QR Code from scanned PDF
                            
                                From HTML form to PDF
                            
                                FlyingSaucer LTR/RTL/BiDi issue with arabic text
                            
                                Python script to remove blank pages using pyPDF
                            
                                How to specify font size in a plot (for PDF output)?
                            
                                Adobe Acrobat Pro XI - Adding Javascript to a PDF
                            
                                "black stain" when extracting page to image on PDFBox 2.0.4
                            
                                Table row no page break when exporting to PDF
                            
                                R + ggplot + pdf device + LaTeX: is it possible to embed fonts one time
                            
                                Generating a PDF document based on a Microsoft Word Template [closed]
                            
                                How to embed a Font in a PDF with RDLC
                            
                                Looking for a web pdf viewer, not the google document viewer
                            
                                Using itextsharp (or any c# pdf library), how to open a PDF, replace some text, and save it again?
                            
                                Saving MATLAB figures as PDF with quality 300 DPI, centered
                            
                                Merge Existing PDF into new ReportLab PDF via flowables
                            
                                PDF to JPEG conversion using Ghostscript
                            
                                Unable to copy exact hindi content from pdf
                            
                                How to add multiple lines at bottom (footer) of PDF?
                            
                                Displaying a specific page of PDF inside iframe in Asp.net
                            
                                script does not continue after command line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract everything from PDF [closed]

Tags:

text

image

pdf

extract

Maksym

People also ask

1 Answers

Adam Goode

Recent Activity

Donate For Us