I have the following function to convert a PDF into a series of images (one image per page):
import Quartz
func convertPDF(at sourceURL: URL, to destinationURL: URL, fileType: NSBitmapImageFileType, dpi: CGFloat = 200) throws -> [URL] {
let fileExtension: String
switch fileType {
case .BMP: fileExtension = "bmp"
case .GIF: fileExtension = "gif"
case .JPEG, .JPEG2000: fileExtension = "jpeg"
case .PNG: fileExtension = "png"
case .TIFF: fileExtension = "tiff"
}
let data = try Data(contentsOf: sourceURL)
let pdfImageRep = NSPDFImageRep(data: data)!
var imageURLs = [URL]()
for i in 0..<pdfImageRep.pageCount {
pdfImageRep.currentPage = i
let width = pdfImageRep.size.width / 72 * dpi
let height = pdfImageRep.size.height / 72 * dpi
let image = NSImage(size: CGSize(width: width, height: height), flipped: false) { dstRect in
pdfImageRep.draw(in: dstRect)
}
let bitmapImageRep = NSBitmapImageRep(data: image.tiffRepresentation!)!
let bitmapData = bitmapImageRep.representation(using: fileType, properties: [:])!
let imageURL = destinationURL.appendingPathComponent("\(sourceURL.deletingPathExtension().lastPathComponent)-Page\(i+1).\(fileExtension)")
try bitmapData.write(to: imageURL, options: [.atomic])
imageURLs.append(imageURL)
}
return imageURLs
}
This works fine, performance is not blisteringly fast but that's not critical. My problem has to do with memory consumption. Let's say I'm converting a long PDF (Apple's 10-Q, 51-page long):
let sourceURL = URL(string: "http://files.shareholder.com/downloads/AAPL/4907179320x0x952191/4B5199AE-34E7-47D7-8502-CF30488B3B05/10-Q_Q3_2017_As-Filed_.pdf")!
let destinationURL = URL(fileURLWithPath: "/Users/mike/PDF")
let _ = try convertPDF(at: sourceURL, to: destinationURL, fileType: .PNG, dpi: 200)
The memory usage keep increasing to ~11GB by the end of the last page!
A few things that I also notice:
bitmapImageRep
and bitmapData
. They don't appear to have been released between iterations.So how can I reduce the memory footprint? Is there a better way to convert PDF to images?
After struggling with this for a whole day, I end up answering my own question.
The solution is to drop lower, into Core Graphics and Image I/O frameworks, to render each PDF page into a bitmap context. This problem lends itself very well to paralellization since each page can be converted into a bitmap on its own thread.
struct ImageFileType {
var uti: CFString
var fileExtention: String
// This list can include anything returned by CGImageDestinationCopyTypeIdentifiers()
// I'm including only the popular formats here
static let bmp = ImageFileType(uti: kUTTypeBMP, fileExtention: "bmp")
static let gif = ImageFileType(uti: kUTTypeGIF, fileExtention: "gif")
static let jpg = ImageFileType(uti: kUTTypeJPEG, fileExtention: "jpg")
static let png = ImageFileType(uti: kUTTypePNG, fileExtention: "png")
static let tiff = ImageFileType(uti: kUTTypeTIFF, fileExtention: "tiff")
}
func convertPDF(at sourceURL: URL, to destinationURL: URL, fileType: ImageFileType, dpi: CGFloat = 200) throws -> [URL] {
let pdfDocument = CGPDFDocument(sourceURL as CFURL)!
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGImageAlphaInfo.noneSkipLast.rawValue
var urls = [URL](repeating: URL(fileURLWithPath : "/"), count: pdfDocument.numberOfPages)
DispatchQueue.concurrentPerform(iterations: pdfDocument.numberOfPages) { i in
// Page number starts at 1, not 0
let pdfPage = pdfDocument.page(at: i + 1)!
let mediaBoxRect = pdfPage.getBoxRect(.mediaBox)
let scale = dpi / 72.0
let width = Int(mediaBoxRect.width * scale)
let height = Int(mediaBoxRect.height * scale)
let context = CGContext(data: nil, width: width, height: height, bitsPerComponent: 8, bytesPerRow: 0, space: colorSpace, bitmapInfo: bitmapInfo)!
context.interpolationQuality = .high
context.setFillColor(.white)
context.fill(CGRect(x: 0, y: 0, width: width, height: height))
context.scaleBy(x: scale, y: scale)
context.drawPDFPage(pdfPage)
let image = context.makeImage()!
let imageName = sourceURL.deletingPathExtension().lastPathComponent
let imageURL = destinationURL.appendingPathComponent("\(imageName)-Page\(i+1).\(fileType.fileExtention)")
let imageDestination = CGImageDestinationCreateWithURL(imageURL as CFURL, fileType.uti, 1, nil)!
CGImageDestinationAddImage(imageDestination, image, nil)
CGImageDestinationFinalize(imageDestination)
urls[i] = imageURL
}
return urls
}
Usage:
let sourceURL = URL(string: "http://files.shareholder.com/downloads/AAPL/4907179320x0x952191/4B5199AE-34E7-47D7-8502-CF30488B3B05/10-Q_Q3_2017_As-Filed_.pdf")!
let destinationURL = URL(fileURLWithPath: "/Users/mike/PDF")
let urls = try convertPDF(at: sourceURL, to: destinationURL, fileType: .png, dpi: 200)
Conversion is now blisteringly fast. Memory usage is a lot lower. Obviously the higher DPI you go the more CPU and memory it needs. Not sure about GPU acceleration as I only have a weak Intel integrated GPU.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With