Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Optimize Storage Of NSAttributedString In Swift Using Data And Codable?

Tags:

swift

codable

I am trying to optimize storage space when saving the contents of a NSTextView, namely its NSTextStorage property, itself a NSAttributedString.

Saving it as Data, for example using the rtfd(from:documentAttributes:) method, and as part of a Codable structure, results in a very large string, much larger than the content itself especially when inserting an image into the NSTextView. For example, inserting a 200K image will result in a 5MB JSON file.

Side note: It is even worse when the Data object is encoded directly rather than as a property of the encoded object, as it is encoded in the form of an array of small integers rather than an arbitrary string. I am not sure why, though I was able to prevent this by inserting the Data into a simple wrapper structure.

Strangely, compressing the actual JSON file using ZIP still results in a 4MB file, merely a 20% gain, so it is unclear to me how a 200K image could turn into such a massive, hardly compressable encoded string.

I would like to figure out what is the proper way to efficiently store NSAttributedString using the Codable protocol. Any hint or advice is much appreciated.

I am also wondering whether there is a valid binary encoding option for Codable.

like image 646
jmdecombe Avatar asked Nov 24 '18 19:11

jmdecombe


1 Answers

TL;DR: RTFD encodes images as PNGs, but you can make it encode JPGs instead to save space. A custom format might be better and easier though if you have the time to create one.

NSAttributedString can encode to HTML, rtf, rtfd, plain text, a variety of Office/Word formats, etc. Given that each of these is an official format with an official spec that must be followed, there's not much that can be done in terms of saving space other than:

  1. Choosing the supported format that works best for your use cases and has the smallest footprint.

OR

  1. Writing your own format.

Approach 1: RTFD

Of the supported format, RTFD does indeed sound best for your use case because it includes support for attachments such as images. Feel free to try out other included formats, of which descriptions are below in "Other Formats".

Saving it as Data, for example using the rtfd(from:documentAttributes:) method, and as part of a Codable structure, results in a very large string, much larger than the content itself especially when inserting an image into the NSTextView. For example, inserting a 200K image will result in a 5MB JSON file.

To understand what is happening here, try out the following code:

do {
    let rtfd = try someAttributedString.rtfdFileWrapper(from: NSRange(location: 0, length: someAttributedString.length), documentAttributes: [:])
    rtfd?.write(to: URL(fileURLWithPath: "/Users/yourname/someFolder/RTFD.rtfd"), options: .atomic, originalContentsURL: nil)
} catch {
    print("\(error)")
}

When you call rtfd(from:documentAttributes:), you're getting flat Data. This flat data can then be encoded somewhere and read back into NSAttributedString. But make no mistake: RTFD is a package format ("D" stands for directory). So by instead calling rtfdFileWrapper(from:documentAttributes:), and writing that to a URL with the rtfd extension, we can see the actual package format that rtfd(from:documentAttributes:) replicates, but as a directory instead of raw data. In Finder, right click the generated file and choose "Show Package Contents".

The RTFD package contains an RTF file to specify the text and attribitues, and a copy of each attachment. So why was your example so much bigger? In my tests, the answer seems to be that RTFD expects to find its images in PNG format. When calling rtfdFileWrapper(from:documentAttributes:) or rtfd(from:documentAttributes:), any image attachments seem to get written out as PNG files, which take up significantly more space. This happens because your image gets wrapped in a NSImage before getting wrapped in a NSTextAttachment. The NSImage is able to write the image data out in other formats, including larger formats like PNG.

I'm assuming the image you tried was in a compressed format like JPEG, and NSAttributedString wrote it to RTFD as PNG.

Using JPEG instead

Assuming you're okay with the image being compressed and not having info such as an alpha channel, you should be able to create an RTFD file with jpg images.

For example, I was able to get an RTFD file down to 2.8 MB from over 12 MB (large image) just by replacing the generated PNG image with the original JPG one. This initially was unacceptible to TextEdit but I then changed the file extension of the image to .png (even though it is still a JPG) and it accepted it.

In code it was even simpler. You may be able to get away with just changing how you add image attachments.

// Don't do this unless you want PNG
let image = NSImage(contentsOf: ...) // NSImage will write to a larger PNG file
let attachment = NSTextAttachment()
attachment.image = image

// Do this if you want smaller files
let image = try? Data(contentsOf: ...) // This will remain in raw JPG format
let attachment = NSTextAttachment(data: image, ofType: kUTTypeJPEG as String) // Explicitly specify JPG

Then when you create a new NSAttributedString with that NSTextAttachment and append it to NSTextStorage, writing RTFD data will be signifantly smaller.

Of course, you may not have control of this process if you're relying on Cocoa UI/API for attaching images. That could make the process more difficult and you may need to resort to modifying the generated data by swapping images.

Approach 2: Custom Format

The approach described immediately above might be inconvenient due to not having control of the attachment-adding process and needing flat data. In that case a custom format might be better.

There's nothing stopping you from designing your own format (binary, text, package, whatever) and then writing a coder for it. You could specify a specific image format or support a variety. It's up to you. And unless you're a fancy word processor, you probably don't need to store all the attributes like font all the time.

I am also wondering whether there is a valid binary encoding option for Codable.

First, note that NSAttributedString is an Objective-C class (when used on Apple platforms) and conforms to NSSecureCoding instead of Codable.

Note that you cannot extend NSAttributedString to conform to Codable, because the init(from:) requirement on Decodable can only be satisfied by guarenteeing that the initializer will be included on all subclasses as well. Since this class is non-final, that means it can only be satisfied by a required init. Required initializers can only be specified on the original declaration, not extensions.

For this reason, if you wanted to conform it to Codable, you would need to use a wrapper object. enumerateAttributes(in:options:using:) should be helpful for getting the attributes and raw characters that need encoded, but you'll need to be sure to pay attention to the images too.

As for encoding in binary, Codable is completely agnostic to format, so you could write your own object conforming to Coder that does whatever you want, including store everything using raw bytes.

Aside: Other Formats

Here's a quick rundown of other supported formats (in order of size). In these tests, I used the very small string "Hello World! There's so much to see!" in the system font. After each format description (in parentheses) is the number of bytes to store that string.

  • Plain Text can store the above format in 36 bytes (1 for each character), but won't preserve attributes or attachments. (36 bytes)
  • RTF seems most lightweight if you need to preserve attributes but not attachments. (331 bytes)
  • HTML Is next lightest, but isn't really designed to be a storage format. In my experience, some attributes such as line spacing get lost when converted to HTML by NSAttributedString. (536 bytes)
  • Binary Plist, which is made when you use NSKeyedArchiver, is a fine option if you only need compatibility with Apple platforms and don't like the above formats. This format supports images too, but is generally still larger than the above (and RTFD). (648 bytes)
  • Web Archive is next for size, but I don't recommend using it as WebKit has deprecated it. Safari still uses it though for some things. (784 bytes)
  • Word ML is probably only useful for those that already know they need it. This format and all below it will generally have a bunch of boilerplate that will become a smaller percentage of the file as text is added. (~1.2 MB)
  • Open Document (OASIS) is smaller than most of the Word formats, but you probably wouldn't use it without a good reason. (~2.4 MB)
  • Office Open XML Is another format you'd only use if you needed that format exactly. (~3.5 MB)
  • Doc (Microsoft Word) This file is very large in comparison for small amounts of text. While I would expect this format to allow images, in my testing the file size did not actually go up when I added one. (~19.4 MB)
  • Mac Simple Text seems to always generate an error. (N/A)

Final Note

In the end, the encoding experience for NSAttributedString should get better as Foundation continues to adapt to Swift rather than Objective-C. You can imagine a day where NSAttributedString or some similar Swifty type conforms to Codable out of the box and can then be paired with any file format Coder.

like image 82
Matthew Seaman Avatar answered Dec 19 '22 23:12

Matthew Seaman