After two days trying to read annotations from a PDF using Quartz, I've managed to do it and posted my code. Now I'd like to do the same for another frequently asked question: searching PDF documents with Quartz. Same situation as before, this question has been asked many times with almost no practical answers. So I need some pointers first, as I still haven't implemented this myself. What I tried: I tried using <code>CGPDFScannerScan</code> handling the <code>TJ</code> and <code>Tj</code> operators - returns the right text on some PDF, whereas on other documents it returns mostly random letters. Maybe it's related to text encoding? Someone pointed out that text blocks (marked by BT/ET operators) should be handled instead, but I still haven't managed to do so. Anyone managed to extract text from any PDF? After that, searching should be easy by storing all the text in a <code>NSMutableString</code> and using <code>rangeOfString</code> (if there's a better way please let me know). But then how to highlight the result? I know there are a few operators to find the glyph sizes, so I could calculate the resulting rect based on those values, but I've been reading the spec for hours... it's a bloated mess and I'm going insane. Anyone with a practical explanation? <h3>Update</h3> User Naveen Thunga found PDFKitten, "a framework for extracting data from PDFs in iOS". I just tried the demo and it seems to work as advertised. I will test it with more PDFs and will post the results soon. As a side note, the code seems very good to me -- if you are interested in how this stuff works it's pretty awesome.

I created utility class in objective-c using PDF.js Which will allow display as well as search PDF file. Utility class allow search using <code>Highlight all search result</code> and 'case sensitive' options. have look PDF search in action Link

PDF search on the iPhone

Tags:

ios

objective-c

pdf

iphone

ipad

After two days trying to read annotations from a PDF using Quartz, I've managed to do it and posted my code.

Now I'd like to do the same for another frequently asked question: searching PDF documents with Quartz. Same situation as before, this question has been asked many times with almost no practical answers. So I need some pointers first, as I still haven't implemented this myself.

What I tried:

I tried using CGPDFScannerScan handling the TJ and Tj operators - returns the right text on some PDF, whereas on other documents it returns mostly random letters. Maybe it's related to text encoding? Someone pointed out that text blocks (marked by BT/ET operators) should be handled instead, but I still haven't managed to do so. Anyone managed to extract text from any PDF?

After that, searching should be easy by storing all the text in a NSMutableString and using rangeOfString (if there's a better way please let me know).

But then how to highlight the result? I know there are a few operators to find the glyph sizes, so I could calculate the resulting rect based on those values, but I've been reading the spec for hours... it's a bloated mess and I'm going insane. Anyone with a practical explanation?

Update

User Naveen Thunga found PDFKitten, "a framework for extracting data from PDFs in iOS". I just tried the demo and it seems to work as advertised. I will test it with more PDFs and will post the results soon. As a side note, the code seems very good to me -- if you are interested in how this stuff works it's pretty awesome.

766

asked Nov 04 '10 13:11

ySgPjx

3 Answers

This isn't a simple problem to implement, but it is straightforward.

For any given page you need to scan the page using the CGPDF scanner API. You need to register callbacks for PDF operators that affect text in the page - not just TJ/Tj, but also those that set font, affect the text drawing matrix, etc. You need to build a state machine that updates with each encountered tag+parameters. You need to examine text accounting for the current font's encoding. When you find text that you want to highlight, you'll need to examine the current text drawing matrix you've been updating to determine the drawing coordinates. Read the PDF specification (version 1.7 is downloadable from Adobe) to understand which operators you need to pay attention to.

Font encoding is perhaps the most difficult part since there are a handful of ways encoding can be specified, and some of them are proprietary to the font. Mostly you can cheat and fall back on a subset of ANSI encoding - but this WILL break on certain PDFs having strange fonts.

Essentially you are processing the page as if you were to render it.

195

answered Nov 03 '22 04:11

TomSwift

I created utility class in objective-c using PDF.js

Which will allow display as well as search PDF file.

Utility class allow search using Highlight all search result and 'case sensitive' options.

have look PDF search in action Link

answered Nov 03 '22 04:11

Jageen

So now in iOS 11 we have PDFKit with which searching text is a breeze

if #available(iOS 11.0, *) {
     let pdfDocument = PDFDocument(url: fileUrl)!
     let allText = pdfDocument.string /// Gets all text in pdf separated by /n

     let s: PDFSelection = pdfDocument.findString("Hello", withOptions: [])
     let sWithFormatting = s!.first!.attributedString
}

answered Nov 03 '22 04:11

Alexandre G

Related questions
                            
                                Xcode Google Maps Search Bar
                            
                                How to use SMS and iMessage notifications on bluetooth device (like smart watch ) using Message Access Profile in iPhone?
                            
                                How do you get Core Image filter such as CILinearGradient to work?
                            
                                How to reduce scrolling speed of UIScroll view
                            
                                How to get interruption start/end events for AVPlayer
                            
                                iOS Updating ViewController UILabel from another Class
                            
                                What does UIViewController class do in viewDidLoad? [duplicate]
                            
                                Sort NSMutableArray with custom objects by another NSMutableArray [duplicate]
                            
                                How can i get UIImage from GPUImageView
                            
                                Pass taps through a UIPanGestureRecognizer
                            
                                UISlider with different colors
                            
                                Google cloud storage integration in iPhone App
                            
                                Add a shadow to CAShapeLayer, so that the inside remains transparent
                            
                                Simple Delegate Between View Controller and UIView Isn't Working
                            
                                AVQueuePlayer starting next item too soon
                            
                                Javascript sort function nor working in Safari/iPhone
                            
                                YouTube Player Orientation Issue on iOS 6
                            
                                how to set anchor point of view for pinch zoom (GMGridView)
                            
                                Changing color of UITableViewCellAccessoryCheckmark and UITextField
                            
                                Dynamically load nib for iPhone/iPad within view controller

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PDF search on the iPhone

Tags:

ios

objective-c

pdf

iphone

ipad

Update

ySgPjx

People also ask

3 Answers

TomSwift

Jageen

Alexandre G

Recent Activity

Donate For Us