What I need is to read pdf, make some transformations (generate TOC bookmarks) and write it back. I found this http://hackage.haskell.org/package/HPDF , but it only mentions generating pdf, not the parsing (although I could have missed it) Haskell is chosen purely for (self)educational purposes.

Checkout pdf-toolbox library. It's support for PDF file generating is low level, but powerful enough for your task. Here is an example how to change title of an existing PDF file using incremental update feature.

There are a few tools for PDF manipulation, though they seem to bias towards generation, rather than parsing: <ul> <li>http://johnmacfarlane.net/pandoc/</li> </ul> Pandoc is a great cross-markup library, but doesn't support PDF parsing (it does support PDF generation from a variety of formats). There's also: <ul> <li> http://hackage.haskell.org/package/HsHaruPDF </li> <li> http://hackage.haskell.org/package/pdf2line -- tool for extracting text from pdf</li> <li> http://hackage.haskell.org/package/HPDF -- another pdf generation library</li> </ul> I'm not sure we have a good parsing tool yet.

Haskell: parsing PDF

2 Answers

Checkout pdf-toolbox library. It's support for PDF file generating is low level, but powerful enough for your task.

Here is an example how to change title of an existing PDF file using incremental update feature.

190

answered Oct 03 '22 19:10

Yuras

There are a few tools for PDF manipulation, though they seem to bias towards generation, rather than parsing:

http://johnmacfarlane.net/pandoc/

Pandoc is a great cross-markup library, but doesn't support PDF parsing (it does support PDF generation from a variety of formats).

There's also:

http://hackage.haskell.org/package/HsHaruPDF
http://hackage.haskell.org/package/pdf2line -- tool for extracting text from pdf
http://hackage.haskell.org/package/HPDF -- another pdf generation library

I'm not sure we have a good parsing tool yet.

answered Oct 03 '22 19:10

Don Stewart

Related questions
                            
                                QTextDocument::drawContents only renders at 96 dpi
                            
                                Strange whitespaces when parsing a PDF
                            
                                How to create editable Pdf form in php
                            
                                How to use the Radaee Pdf reader sdk
                            
                                Android - No Activity found to handle Intent { act=android.intent.action.VIEW - Trying to open a PDF File
                            
                                How to compare two pdf files through command line [closed]
                            
                                How do I figure out the font family and the font size of the words in a pdf document?
                            
                                Submit pdf form fields to a HTTP POST request
                            
                                What's the best way to convert a FlowDocument into PDF
                            
                                PHP PDF Generator Advice [closed]
                            
                                How to Get PDF page width and Height?
                            
                                Adding fonts to Apache Pdfbox?
                            
                                Intellij print all classes as PDF
                            
                                Writing Arabic with PDFBOX with correct characters presentation form without being separated
                            
                                Reading PDF from within an Android application [closed]
                            
                                Can Mathematica create multi-page PDF files?
                            
                                PDF Spec vs Acrobat creation (QuadPoints)
                            
                                Convert HTML form data into a PDF file using PHP
                            
                                xtable in .Rmd then knit as pdf in rstudio shows % comments
                            
                                How to convert PDF to CSV with tabula-py?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Haskell: parsing PDF

Tags:

pdf

haskell

artemave

People also ask

2 Answers

Yuras

Don Stewart

Recent Activity

Donate For Us