-[NSMutableAttributedString initWithHTML:documentAttributes:]
seems to mangle special characters:
NSString *html = @"“Hello” World"; // notice the smart quotes
NSData *htmlData = [html dataUsingEncoding:NSUTF8StringEncoding];
NSMutableAttributedString *as = [[NSMutableAttributedString alloc] initWithHTML:htmlData documentAttributes:nil];
NSLog(@"%@", as);
That prints “Hello†World
followed by some RTF commands. In my application, I convert the attributed string to RTF and display it in an NSTextView
, but the characters are corrupted there, too.
According to the documentation, the default encoding is UTF-8, but I tried being explicit and the result is the same:
NSDictionary *attributes = @{NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]};
NSMutableAttributedString *as = [[NSMutableAttributedString alloc] initWithHTML:htmlData documentAttributes:&attributes];
Use [html dataUsingEncoding:NSUnicodeStringEncoding]
when creating the NSData and set the matching encoding option when you parse the HTML into an attributed string:
The documentation for NSCharacterEncodingDocumentAttribute
is slightly confusing:
NSNumber, containing an int specifying the
NSStringEncoding
for the file; for reading and writing plain text files and writing HTML; default for plain text is the default encoding; default for HTML is UTF-8.
So, you code should be:
NSString *html = @"“Hello” World";
NSData *htmlData = [html dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary *options = @{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)};
NSMutableAttributedString *as =
[[NSMutableAttributedString alloc] initWithHTML:htmlData
options: options
documentAttributes:nil];
The previous answer here works, but mostly by accident.
Making an NSData
with NSUnicodeStringEncoding
will tend to work, because that constant is an alias for NSUTF16StringEncoding
, and UTF-16 is pretty easy for the system to identify. Easier than UTF-8, which apparently was being identified as some other superset of ASCII (it looks like NSWindowsCP1252StringEncoding
in your case, probably because it's one of the few ASCII-based encodings with mappings for 0x8_ and 0x9_).
That answer is mistaken in quoting the documentation for NSCharacterEncodingDocumentAttribute
, because "attributes" are what you get out of -initWithHTML
. That's why it's NSDictionary **
and not just NSDictionary *
. You can pass in a pointer to an NSDictionary *
, and you'll get out keys like TopMargin/BottomMargin/LeftMargin/RightMargin, PaperSize, DocumentType, UTI, etc. Any values you try to pass in through the "attributes" dictionary are ignored.
You need to use "options" for passing values in, and the relevant option key is NSTextEncodingNameDocumentOption
, which has no documented default value. It's passing the bytes to WebKit for parsing, so if you don't specify an encoding, presumably you're getting WebKit's encoding-guessing heuristics.
To guarantee the encoding types match between your NSData
and NSAttributedString
, what you should do is something like:
NSString *html = @"“Hello” World";
NSData *htmlData = [html dataUsingEncoding:NSUTF8StringEncoding];
NSMutableAttributedString *as =
[[NSMutableAttributedString alloc] initWithHTML:htmlData
options:@{NSTextEncodingNameDocumentOption: @"UTF-8"}
documentAttributes:nil];
Swift version of accepted answer is:
let htmlString: String = "Hello world contains html</br>"
let data: Data = Data(htmlString.utf8)
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
let attributedString = try? NSAttributedString(data: data,
options: options,
documentAttributes: nil)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With