Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract a single page (or range of pages) from pdf data without loading the whole pdf (which takes too much RAM sometimes)

Tags:

ios

pdf

swift

Using PDFKit in swift, you can use PDFDocument to open pdf files. That's easy and works well. But I'm building a custom pdf viewer (for comic book pdfs) that suits my needs and there is one problem I have. In a viewer, I don't need to have the whole pdf file in memory. I only need about a few pages at a time.

Also, the pdfs consist only of images. There's no text or anything.

When instantiating a PDFDocument, the whole pdf data is being loaded into memory. If you have really huge pdf files (over 1GB) this isn't optimal (and can crash on some devices). As far as I know, there's no way in PDFKit to only load parts of a pdf document.

Is there anything I can do about that? I haven't found a swift/obj-c library that can do this (though I don't really know the right keywords to search for it).

My workaround would be to preprocess pdfs and save each page as image in the .documents director (or similar) using FileManager. That would result in a tremendous amount of files but would solve the memory problem. I'm not sure I like this approach, though.

Update:

So I did what @Prcela and @Sahil Manchanda proposed. It seems to be working for now.

@yms: Hm, that could be a problem, indeed. Does this even happen when there are only images? Without anything else in the pdf.

@Carpsen90: They are local (saved in the documents directory).

EDIT: I haven't accepted the answer below, or given it the bounty. This was automatically. It does not solve the problem. It still loads the entire PDF into memory!

like image 623
Quantm Avatar asked Sep 01 '18 11:09

Quantm


People also ask

How do I extract specific data from a PDF?

You can extract data from PDF files directly into Excel. First, you'll need to import your PDF file. Once you import the file, use the extract data button to begin the extraction process. You should see several instruction windows that will help you extract the selected data.

How do I extract certain pages from a PDF for free?

Drag and drop a file, then remove pages from your PDF. Select a PDF, then remove pages. Drag and drop a file, then remove pages from your PDF.


1 Answers

I have an idea how you could achieve this in PDFKit. After reading the documentation there is a function which allows for the selection of certain pages. Which would probably solve your problem if you would add it to a collectionFlowView.

func selection(from startPage: PDFPage, atCharacterIndex startCharacter: Int, to endPage: PDFPage, atCharacterIndex endCharacter: Int) -> PDFSelection?

However as I read that you mainly have images there is another function which allows to extract parts of the pdf based on CGPoints:

func selection(from startPage: PDFPage, at startPoint: CGPoint, to endPage: PDFPage, at endPoint: CGPoint) -> PDFSelection?

Also have a look at this: https://developer.apple.com/documentation/pdfkit/pdfview

as this might be what you need if you only want to view the pages without any annotations editing etc.

I also prepared a little code to extract one page below. Hope it helps.

import PDFKit
import UIKit

class PDFViewController: UIViewController {

    override func viewDidLoad() {
        super.viewDidLoad()

        guard let url = Bundle.main.url(forResource: "myPDF", withExtension: "pdf") else {fatalError("INVALID URL")}
        let pdf = PDFDocument(url: url)
        let page = pdf?.page(at: 10) // returns a PDFPage instance
        // now you have one page extracted and you can play around with it.
    }
}

EDIT 1: Have a look at this code extraction. I understand that the whole PDF gets loaded however this approach might be more memory efficient as perhaps iOS will be handling it better in a PDFView:

func readBook() {

if let oldBookView = self.view.viewWithTag(3) {
    oldBookView.removeFromSuperview()
    // This removes the old book view when the user chooses a new book language
}

if #available(iOS 11.0, *) {
    let pdfView: PDFView = PDFView()
    let path = BookManager.getBookPath(bookLanguageCode: book.bookLanguageCode)
    let url = URL(fileURLWithPath: path)
    if let pdfDocument = PDFDocument(url: url) {
        pdfView.displayMode = .singlePageContinuous
        pdfView.autoScales = true
        pdfView.document = pdfDocument
        pdfView.tag = 3 // I assigned a tag to this view so that later on I can easily find and remove it when the user chooses a new book language
        let lastReadPage = getLastReadPage()

        if let page = pdfDocument.page(at: lastReadPage) {
            pdfView.go(to: page)
            // Subscribe to notifications so the last read page can be saved
            // Must subscribe after displaying the last read page or else, the first page will be displayed instead
            NotificationCenter.default.addObserver(self, selector: #selector(self.saveLastReadPage),name: .PDFViewPageChanged, object: nil)
        }
    }

    self.containerView.addSubview(pdfView)
    setConstraints(view: pdfView)
    addTapGesture(view: pdfView)
}

EDIT 2: this is not the answer the OP was looking for. This also loads the whole pdf into the memory. Read comments

like image 172
AD Progress Avatar answered Oct 17 '22 15:10

AD Progress