Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert NSString with unicode characters into valid HTML

I am getting a string from an API that has anchor tags in it, so I am creating an NSAttributedString from it, and displaying it in a UITextView so I can support tappable links.

The problem is that the incoming string isn't valid HTML, so it has unescaped unicode characters in it. Things like:

  • HORIZONTAL ELLIPSIS Unicode: U+2026, UTF-8: E2 80 A6
  • EM DASH Unicode: U+2014, UTF-8: E2 80 94

While I could deal with those specific cases, I'm concerned about any other unicode characters that come in, that I don't currently know about.

Example:

NSString *fromAPI = @"Reagan \U2014 saying";
NSDictionary *options = @{NSDocumentTypeDocumentAttribute : NSHTMLTextDocumentType};
NSData *data = [fromAPI dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:NO];
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:data options:options documentAttributes:nil error:nil];

This renders in the UITextView as: enter image description here

How do I get it to render the em dash and other unicode properly?

like image 901
jamone Avatar asked Jun 17 '14 15:06

jamone


1 Answers

Found it, it looks like HTML won't render unicode unless you add this into the <head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
like image 92
jamone Avatar answered Oct 15 '22 11:10

jamone