I want to parse a pdf file, for that I am using pdftotext
utility which converts pdf file into text file, now I want to remove a page number, header and footer from text file.
I am converting a pdf file using following syntax:
pdftotext -layout input.pdf output.txt
Can anyone help me on this?
On the Insert tab, select the Page Number icon, and then click Remove Page Numbers. If the Remove Page Numbers button isn't available, double-click in the header or footer, select the page number, and press Delete.
Open the PDF in Acrobat. Choose the Organize Pages tool from the right pane. The Organize Pages toolset is displayed in the secondary toolbar, and the page thumbnails are displayed in the Document area. Select a page thumbnail you want to delete and click the Delete icon to delete the page.
You need crop with params -H -W -y -x, as least -H -W -y.
Example:
pdftotext -y 80 -H 650 -W 1000 -nopgbrk -eol unix example.pdf
-y 80 -> crop 80 pixels after the top of file (remove header);
-H 650 -> crop 650 pixels after the -y has cropped (remove footer);
-W 1000 -> hight value to crop nothing (need especify something);
You need adjust -y and -H to each PDF, sometimes reducing -y and increasing -H to fit with the header and footer;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With