Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract vector image from a pdf file [closed]

Is there a command line tool on linux that would extract figures from a pdf file, and save them in vector format? I know about pdfimages, but that would create a bitmap, and that is not what I need.

like image 986
v923z Avatar asked Mar 28 '12 08:03

v923z


People also ask

Does PDF preserve vector?

Vector object information is saved entirely within a PDF document; this allows the vector graphics to be displayed at any zoom level, and still maintain "perfect" quality. Another advantage of vector graphics is that using them can take up less space than an equivalent bitmap image.

How do I extract an image from a PDF without losing quality?

In preferences/general check the box that says 'use fixed resolution for snapshot tool' and set the resolution to your liking e.g., 300ppi or even higher. Then take a snapshot (tools/select & zoom/snapshot tool) and it will copy a high res copy to your clipboard. Then paste it from your clipboard where you want.


3 Answers

not for images only, as you seem to need, but

  • pdftocairo

http://poppler.freedesktop.org/

http://www.manpagez.com/man/1/pdftocairo/ (manpage)

is able to render a pdf page to other vector formats like PS/EPS/SVG

assuming you have a pdf page with vectorized images, you can render this page to svg and then copy only image you are interested in

note: pdftocairo cannot render multipage pdf to multipage svg

if you need to convert to svg several pdf pages you need first to pick this page range and then burst pdf pages into single pdf pages

example (if we need to convert pages 1-10 of a pdf file to svg)

pdftk file.pdf cat 1-10 output 1-10.pdf

pdftk 1-10.pdf burst

for f in *.pdf; do pdftocairo -svg $f; done

finally, with sodipodi or inkscape, you can extract images you are interested from svg rendered pdf page

like image 173
Dingo Avatar answered Oct 05 '22 13:10

Dingo


What do you consider a "figure"? This is a concept that doesn't exist in PDF. The reason there are so many tools that can extract images from a PDF file, is because images are a very clearly identified entity.

Your "figures" however, are much less clearly defined. PDF files may contain lots of vector content that you wouldn't call a figure. Text can be stroked for example, which would make it vector art and as such it might be confused with your figures. Other decorative elements may be used in the background of the pages. Text may be underlined, which would be a vector element...

In the other direction, your "figure" may contain a caption that is text, further complicating things.

As PDF doesn't have the notion of a figure, you'll have to figure out how to isolate one on a PDF page (perhaps because the creator application always adds metadata to them, or because they use a special color or... If you can isolate them, it should be possible to trim everything irrelevant on the page and export what you need as EPS or SVG using some of the techniques described in the other answer.

like image 32
David van Driessche Avatar answered Oct 05 '22 12:10

David van Driessche


This article describes the tools gpdfx, inkscape and pdf2svg which are not completely commandline-based, but still sound helpful.

like image 2
Falko Menge Avatar answered Oct 05 '22 14:10

Falko Menge