I am trying to optimize storage space when saving the contents of a NSTextView
, namely its NSTextStorage
property, itself a NSAttributedString
.
Saving it as Data
, for example using the rtfd(from:documentAttributes:)
method, and as part of a Codable
structure, results in a very large string, much larger than the content itself especially when inserting an image into the NSTextView
. For example, inserting a 200K image will result in a 5MB JSON file.
Side note: It is even worse when the Data
object is encoded directly rather than as a property of the encoded object, as it is encoded in the form of an array of small integers rather than an arbitrary string. I am not sure why, though I was able to prevent this by inserting the Data
into a simple wrapper structure.
Strangely, compressing the actual JSON file using ZIP still results in a 4MB file, merely a 20% gain, so it is unclear to me how a 200K image could turn into such a massive, hardly compressable encoded string.
I would like to figure out what is the proper way to efficiently store NSAttributedString
using the Codable
protocol. Any hint or advice is much appreciated.
I am also wondering whether there is a valid binary encoding option for Codable
.
TL;DR: RTFD encodes images as PNGs, but you can make it encode JPGs instead to save space. A custom format might be better and easier though if you have the time to create one.
NSAttributedString
can encode to HTML, rtf, rtfd, plain text, a variety of Office/Word formats, etc. Given that each of these is an official format with an official spec that must be followed, there's not much that can be done in terms of saving space other than:
OR
Of the supported format, RTFD does indeed sound best for your use case because it includes support for attachments such as images. Feel free to try out other included formats, of which descriptions are below in "Other Formats".
Saving it as Data, for example using the rtfd(from:documentAttributes:) method, and as part of a Codable structure, results in a very large string, much larger than the content itself especially when inserting an image into the NSTextView. For example, inserting a 200K image will result in a 5MB JSON file.
To understand what is happening here, try out the following code:
do {
let rtfd = try someAttributedString.rtfdFileWrapper(from: NSRange(location: 0, length: someAttributedString.length), documentAttributes: [:])
rtfd?.write(to: URL(fileURLWithPath: "/Users/yourname/someFolder/RTFD.rtfd"), options: .atomic, originalContentsURL: nil)
} catch {
print("\(error)")
}
When you call rtfd(from:documentAttributes:)
, you're getting flat Data
. This flat data can then be encoded somewhere and read back into NSAttributedString
. But make no mistake: RTFD is a package format ("D" stands for directory). So by instead calling rtfdFileWrapper(from:documentAttributes:)
, and writing that to a URL
with the rtfd
extension, we can see the actual package format that rtfd(from:documentAttributes:)
replicates, but as a directory instead of raw data. In Finder, right click the generated file and choose "Show Package Contents".
The RTFD package contains an RTF file to specify the text and attribitues, and a copy of each attachment. So why was your example so much bigger? In my tests, the answer seems to be that RTFD expects to find its images in PNG format. When calling rtfdFileWrapper(from:documentAttributes:)
or rtfd(from:documentAttributes:)
, any image attachments seem to get written out as PNG files, which take up significantly more space. This happens because your image gets wrapped in a NSImage
before getting wrapped in a NSTextAttachment
. The NSImage
is able to write the image data out in other formats, including larger formats like PNG.
I'm assuming the image you tried was in a compressed format like JPEG, and NSAttributedString
wrote it to RTFD as PNG.
Using JPEG
instead
Assuming you're okay with the image being compressed and not having info such as an alpha channel, you should be able to create an RTFD file with jpg
images.
For example, I was able to get an RTFD file down to 2.8 MB from over 12 MB (large image) just by replacing the generated PNG image with the original JPG one. This initially was unacceptible to TextEdit but I then changed the file extension of the image to .png
(even though it is still a JPG) and it accepted it.
In code it was even simpler. You may be able to get away with just changing how you add image attachments.
// Don't do this unless you want PNG
let image = NSImage(contentsOf: ...) // NSImage will write to a larger PNG file
let attachment = NSTextAttachment()
attachment.image = image
// Do this if you want smaller files
let image = try? Data(contentsOf: ...) // This will remain in raw JPG format
let attachment = NSTextAttachment(data: image, ofType: kUTTypeJPEG as String) // Explicitly specify JPG
Then when you create a new NSAttributedString
with that NSTextAttachment
and append it to NSTextStorage
, writing RTFD data will be signifantly smaller.
Of course, you may not have control of this process if you're relying on Cocoa UI/API for attaching images. That could make the process more difficult and you may need to resort to modifying the generated data by swapping images.
The approach described immediately above might be inconvenient due to not having control of the attachment-adding process and needing flat data. In that case a custom format might be better.
There's nothing stopping you from designing your own format (binary, text, package, whatever) and then writing a coder for it. You could specify a specific image format or support a variety. It's up to you. And unless you're a fancy word processor, you probably don't need to store all the attributes like font all the time.
I am also wondering whether there is a valid binary encoding option for Codable.
First, note that NSAttributedString
is an Objective-C class (when used on Apple platforms) and conforms to NSSecureCoding
instead of Codable
.
Note that you cannot extend NSAttributedString
to conform to Codable
, because the init(from:)
requirement on Decodable
can only be satisfied by guarenteeing that the initializer will be included on all subclasses as well. Since this class is non-final
, that means it can only be satisfied by a required init
. Required initializers can only be specified on the original declaration, not extensions.
For this reason, if you wanted to conform it to Codable
, you would need to use a wrapper object. enumerateAttributes(in:options:using:)
should be helpful for getting the attributes and raw characters that need encoded, but you'll need to be sure to pay attention to the images too.
As for encoding in binary, Codable
is completely agnostic to format, so you could write your own object conforming to Coder
that does whatever you want, including store everything using raw bytes.
Here's a quick rundown of other supported formats (in order of size). In these tests, I used the very small string "Hello World! There's so much to see!"
in the system font. After each format description (in parentheses) is the number of bytes to store that string.
NSAttributedString
. (536 bytes)NSKeyedArchiver
, is a fine option if you only need compatibility with Apple platforms and don't like the above formats. This format supports images too, but is generally still larger than the above (and RTFD). (648 bytes)In the end, the encoding experience for NSAttributedString
should get better as Foundation continues to adapt to Swift rather than Objective-C. You can imagine a day where NSAttributedString
or some similar Swifty type conforms to Codable
out of the box and can then be paired with any file format Coder
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With