Parsing pdf files [closed]

Tags:

I have a requirement to split a large pdf document into smaller files based on the content of the file. We use BCL easyPDF to manipulate pdf files. easyPDF can split pdf documents based on a page number, but it cannot split the document based on the file content. Also it does not have a search function (as far as I can tell, if I am wrong please someone let me know.) to determine the location of the content.

Now can someone tell me how I can find the location of text in a pdf file using .net?

Thanks

719

asked May 03 '12 18:05

desi

1 Answers

You might try Docotic.Pdf library for your task.

The library can extract text from PDFs (with or without formatting).

Or you could just retrieve a collection of words with their bounding rectangles from PDFs. This should help you to find location of the text in a file.

Disclaimer: I work for the vendor of the library.

answered Oct 07 '22 02:10

Bobrovsky

Related questions
                            
                                Create relative path between two paths [duplicate]
                            
                                Service is stopped but process remains for a minute
                            
                                Is clean domain-driven-design(DDD) a utopia? [closed]
                            
                                How to debug Predicates in C#/Visual Studio?
                            
                                Cooking Measurements in C# / F#
                            
                                Explanation about high-resolution performance counter and its existence related to .NET Stopwatch?
                            
                                Getting a Service to Run Inside of an Azure Worker Role
                            
                                How can I improve this code: Inheritance and IEquatable<>
                            
                                Am I really forced to ReadToEnd() a StreamReader reading an Ionic.Zlib.GZipStream?
                            
                                How to avoid casting from interface to class
                            
                                A generic class with two non-equal (unique) types
                            
                                Crash with a x64 .NET 4.0 application in Release mode outside of the IDE only
                            
                                Why I got extra close button on mdi child window?
                            
                                Defines.Debug vs #if DEBUG
                            
                                Synchronizing worker with UI thread
                            
                                What feature is causing these code hints to appear in VS2010?
                            
                                Using Code Contracts to define an immutable interface?
                            
                                Does the Presenter Perform GUI Logic in the MVP Pattern?
                            
                                Hashtable collision rehashing - how are values read?
                            
                                Rabbitmq message arrival time stamp

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing pdf files [closed]

Tags:

c#

parsing

pdf

pdf-scraping

desi

People also ask

1 Answers

Bobrovsky

Recent Activity

Donate For Us