Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the return value of String.addingPercentEncoding() optional?

The signature of the String method for percent-escaping is:

func addingPercentEncoding(withAllowedCharacters: CharacterSet)
    -> String?

(This was stringByAddingPercentEncodingWithAllowedCharacters in Swift 2.)

Why does this method return an optional?

The documentation says that the method returns nil “if the transformation is not possible,” but it's unclear under what circumstances the escaping transformation could fail:

  • Characters are escaped using UTF-8, which is a complete Unicode encoding. Any valid Unicode character can be encoded using UTF-8, and thus can be escaped.

  • I thought perhaps the method applied some kind of sanity check for bad interactions between the set of allowed chars and the chars used for escaping, but this is not the case: the method succeeds no matter whether the set of allowed chars contains "%", and also succeeds if the allowed char set is empty.

As it stands, the non-optional return value appear to be forcing a nonsensical error check.

like image 768
Paul Cantrell Avatar asked Nov 06 '15 03:11

Paul Cantrell


People also ask

What is addingPercentEncoding?

addingPercentEncoding(withAllowedCharacters:)Returns a new string made from the receiver by replacing all characters not in the specified set with percent-encoded characters.

What is string encoding in Swift?

Overview. A string is a series of characters, such as "Swift" , that forms a collection. Strings in Swift are Unicode correct and locale insensitive, and are designed to be efficient. The String type bridges with the Objective-C class NSString and offers interoperability with C functions that works with strings.


2 Answers

I filed a bug report with Apple about this, and heard back — with a very helpful response, no less!

Turns out (much to my surprise) that it’s possible to successfully create Swift strings that contain invalid Unicode in the form of unpaired UTF-16 surrogate chars. Such a string can cause UTF-8 encoding to fail. Here’s some code that illustrates this behavior:

// Succeeds (wat?!):
let str = String(
    bytes: [0xD8, 0x00] as [UInt8],
    encoding: .utf16BigEndian)!

// Returns nil:
str.addingPercentEncoding(withAllowedCharacters: .alphanumerics)
like image 159
Paul Cantrell Avatar answered Oct 22 '22 18:10

Paul Cantrell


Based on Paul Cantrell answer, small demonstration that it's also possible for the same method to also return null in Objective-C, despite String and NSString being different beasts when it comes to encodings:

uint8_t bytes[2] = { 0xD8, 0x00 };
NSString *string = [[NSString alloc] initWithBytes:bytes length:2 encoding:NSUTF16BigEndianStringEncoding];
// \ud800
NSLog(@"%@", string);

NSString *escapedString = [string stringByAddingPercentEncodingWithAllowedCharacters:NSCharacterSet.URLHostAllowedCharacterSet];
// (null)
NSLog(@"%@", escapedString);

For fun, https://r12a.github.io/app-conversion/ will percent escape the same as:

Error%20in%20convertUTF162Char%3A%20low%20surrogate%20expected%2C%20b%3D0%21%00

like image 36
Cœur Avatar answered Oct 22 '22 18:10

Cœur