Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing pdf files [closed]

I have a requirement to split a large pdf document into smaller files based on the content of the file. We use BCL easyPDF to manipulate pdf files. easyPDF can split pdf documents based on a page number, but it cannot split the document based on the file content. Also it does not have a search function (as far as I can tell, if I am wrong please someone let me know.) to determine the location of the content.

Now can someone tell me how I can find the location of text in a pdf file using .net?

Thanks

like image 719
desi Avatar asked May 03 '12 18:05

desi


People also ask

Why PDF file is getting closed automatically?

We're sorry for the trouble you had with Adobe Reader, please reboot the machine once and navigate to Adobe Reader's preferences from Edit>Preferences>Security(Enhanced)>and try disabling 'Enable Protected Mode at startup'>Click OK and restart the application and check.

Is it possible to parse a PDF?

A PDF Parser (also sometimes called PDF scraper) is a software that can be used to extract data from PDF documents. PDF Parsers can come in form of libraries for developers or as standalone software products for end-users. PDF Parsers are used mainly to extract data from a batch of PDF files.

How do you pick up reading where you left off in a PDF file?

Adobe Acrobat ReaderOn the left side of the Preferences dialog, select Documents under Categories. Then, check the Restore last view settings when reopening documents checkbox. Now, when you reopen any PDF file, Acrobat Reader will jump to the page you were viewing when you last closed the file.


1 Answers

You might try Docotic.Pdf library for your task.

The library can extract text from PDFs (with or without formatting).

Or you could just retrieve a collection of words with their bounding rectangles from PDFs. This should help you to find location of the text in a file.

Disclaimer: I work for the vendor of the library.

like image 53
Bobrovsky Avatar answered Oct 07 '22 02:10

Bobrovsky