Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove a page number, header and footer from pdf file

Tags:

pdftotext

I want to parse a pdf file, for that I am using pdftotext utility which converts pdf file into text file, now I want to remove a page number, header and footer from text file.

I am converting a pdf file using following syntax:

pdftotext -layout input.pdf output.txt

Can anyone help me on this?

like image 940
Deepti Kakade Avatar asked Jan 12 '15 11:01

Deepti Kakade


People also ask

Can I remove page numbers from a PDF?

On the Insert tab, select the Page Number icon, and then click Remove Page Numbers. If the Remove Page Numbers button isn't available, double-click in the header or footer, select the page number, and press Delete.

How do you remove page 1 from a PDF?

Open the PDF in Acrobat. Choose the Organize Pages tool from the right pane. The Organize Pages toolset is displayed in the secondary toolbar, and the page thumbnails are displayed in the Document area. Select a page thumbnail you want to delete and click the Delete icon to delete the page.


1 Answers

You need crop with params -H -W -y -x, as least -H -W -y.

Example:

pdftotext -y 80 -H 650 -W 1000 -nopgbrk -eol unix example.pdf


-y 80   -> crop 80 pixels after the top of file (remove header);
-H 650  -> crop 650 pixels after the -y has cropped (remove footer);
-W 1000 -> hight value to crop nothing (need especify something);

You need adjust -y and -H to each PDF, sometimes reducing -y and increasing -H to fit with the header and footer;

like image 56
Reinaldo Gil Avatar answered Oct 02 '22 04:10

Reinaldo Gil