Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NSJSONSerialization serialization of a string containing forward slashes / and HTML is escaped incorrectly

I am trying to convert some simple HTML into a string value in a JSON object and I'm having trouble getting the string encoding to not escape the string in NSJSONSerialization.

Example... I have a string which contains some basic HTML text:

NSString *str = @"<html><body><p>Samples / Text</p></body></html>";

The desired outcome is JSON with HTML as the value:

{
    "Title":"My Title",
    "Instructions":"<html><body><p>Samples / Text</p></body></html>"
}

I'm using the standard technique to convert an NSDictionary to a NSString containing JSON:

NSMutableDictionary *dict = [NSMutableDictionary dictionary];
[dict setObject:str forKey:@"Instructions"];
[dict setObject:@"My Title" forKey:@"Title"];

NSError *err;
NSData *data = [NSJSONSerialization dataWithJSONObject:dict options:NSJSONWritingPrettyPrinted error:&err];
NSString *resultingString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
NSLog(@"%@", resultingString);

The JSON produced by this method is valid, however the HTML has all forward slashes escaped:

{
    "Title":"My Title",
    "Instructions":"<html><body><p>Samples \/ Text<\/p><\/body><\/html>"
}

This creates invalid HTML in the instructions JSON string.

I'd like to stick with NSJSONSerialization since we're using that everywhere else in our framework and I've been burned before switching to non-Apple libraries as they get desupported. I've tried many different string encodings and all of them escape the angle brackets.

Apparently \/ is a valid representation in JavaScript for the / characters, which is why forward slash is escaped (even the StackOverflow text editor escaped it). See: escaping json string with a forward slash? and also JSON: why are forward slashes escaped?. I just don't want it to do that and there doesn't seem to be a way to stop iOS from escaping forward slashes in string values when serializing.

like image 710
JasonD Avatar asked Apr 03 '13 17:04

JasonD


1 Answers

I believeNSJSONSerialization is behaving as designed in regards to encoding HTML.

If you look at some questions (1, 2) on encoding HTML in JSON you'll see the answers always mention escaping the forward slashes.

JSON doesn't require forward slashes to be escaped, but HTML doesn't allow a javascript string to contain </ as it can be confused with the end of the <SCRIPT> tag.

See the answers here, here and most directly the w3.org HTML4 Appendix which states in B.3.2 Specifying non-HTML data

ILLEGAL EXAMPLE: 
The following script data incorrectly contains a "</" sequence (as part of "</EM>") before the SCRIPT end tag:

<SCRIPT type="text/javascript">
  document.write ("<EM>This won't work</EM>")
</SCRIPT>

Although this behaviour may cause issues for you NSJSONSerialisation is just playing by the age old rules of encoding HTML data for use in <SCRIPT> tags.

like image 64
Jessedc Avatar answered Oct 12 '22 13:10

Jessedc