I have a repository of PDF documents, and most of the text contained in these documents are formatted in Comic Sans. I would like to change this to something similar to Arial. The original font is embedded in the document. I haven't found any existing tool to do this for me (I'm on Linux), and I wonder if it's possible to do it programmaticaly. A Python library would be perfect, but a library in any programming language would do.
In which library will I be able to substitute fonts with the least effort? And which parts of the API would I use?
There are commercial tools that can do this - one of which is pdfToolbox from callas software (warning - I'm affiliated with this company).
However - even though this functionality exists and is sometimes used - the results are often completely undesirable and I have not seen many contexts where it is used on more than very specific files. And usually with limited success. To the point where this replacement is only available as a manual operation in the tool I mentioned - and not in automatic mode.
Depending on how complex these files are, you would probably have better success to extract all text from the documents into something like RTF, do whatever manipulation you need to do there and regenerate PDF afterwards. Sounds like a roundabout way but I'm guessing the result will be better in most cases...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With