Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NSCharacterSet.characterIsMember() with Swift's Character type

Tags:

swift

Imagine you've got an instance of Swift's Character type, and you want to determine whether it's a member of an NSCharacterSet. NSCharacterSet's characterIsMember method takes a unichar, so we need to get from Character to unichar.

The only solution I could come up with is the following, where c is my Character:

let u: unichar = ("\(c)" as NSString).characterAtIndex(0)
if characterSet.characterIsMember(u) {
    dude.abide()
}

I looked at Character but nothing leapt out at me as a way to get from it to unichar. This may be because Character is more general than unichar, so a direct conversion wouldn't be safe, but I'm only guessing.

If I were iterating a whole string, I'd do something like this:

let s = myString as NSString
for i in 0..<countElements(myString) {
    let u = s.characterAtIndex(i)
    if characterSet.characterIsMember(u) {
        dude.abide()
    }
}

(Warning: The above is pseudocode and has never been run by anyone ever.) But this is not really what I'm asking.

like image 380
Gregory Higley Avatar asked Dec 29 '14 23:12

Gregory Higley


3 Answers

My understanding is that unichar is a typealias for UInt16. A unichar is just a number.

I think that the problem that you are facing is that a Character in Swift can be composed of more than one unicode "characters". Thus, it cannot be converted to a single unichar value because it may be composed of two unichars. You can decompose a Character into its individual unichar values by casting it to a string and using the utf16 property, like this:

let c: Character = "a"
let s = String(c)
var codeUnits = [unichar]()
for codeUnit in s.utf16 {
    codeUnits.append(codeUnit)
}

This will produce an array - codeUnits - of unichar values.

EDIT: Initial code had for codeUnit in s when it should have been for codeUnit in s.utf16

You can tidy things up and test for whether or not each individual unichar value is in a character set like this:

let char: Character = "\u{63}\u{20dd}" // This is a 'c' inside of an enclosing circle
for codeUnit in String(char).utf16 {
    if NSCharacterSet(charactersInString: "c").characterIsMember(codeUnit) {
        dude.abide()
    } // dude will abide() for codeUnits[0] = "c", but not for codeUnits[1] = 0x20dd (the enclosing circle)
}

Or, if you are only interested in the first (and often only) unichar value:

if NSCharacterSet(charactersInString: "c").characterIsMember(String(char).utf16[0]) {
    dude.abide()
}

Or, wrap it in a function:

func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
    return set.characterIsMember(String(char).utf16[0])
}

let xSet = NSCharacterSet(charactersInString: "x")
isChar("x", inSet: xSet)  // This returns true
isChar("y", inSet: xSet)  // This returns false

Now make the function check for all unichar values in a composed character - that way, if you have a composed character, the function will only return true if both the base character and the combining character are present:

func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
    var found = true
    for ch in String(char).utf16 {
        if !set.characterIsMember(ch) { found = false }
    }
    return found
}

let acuteA: Character = "\u{e1}"                   // An "a" with an accent
let acuteAComposed: Character = "\u{61}\u{301}"    // Also an "a" with an accent

// A character set that includes both the composed and uncomposed unichar values
let charSet = NSCharacterSet(charactersInString: "\u{61}\u{301}\u{e1}")

isChar(acuteA, inSet: charSet)           // returns true
isChar(acuteAComposed, inSet: charSet)   // returns true (both unichar values were matched

The last version is important. If your Character is a composed character you have to check for the presence of both the base character ("a") and the combining character (the acute accent) in the character set or you will get false positives.

like image 139
Aaron Rasmussen Avatar answered Nov 13 '22 16:11

Aaron Rasmussen


I would treat the Character as a String and let Cocoa do all the work:

func charset(cset:NSCharacterSet, containsCharacter c:Character) -> Bool {
    let s = String(c)
    let ix = s.startIndex
    let ix2 = s.endIndex
    let result = s.rangeOfCharacterFromSet(cset, options: nil, range: ix..<ix2)
    return result != nil
}

And here's how to use it:

let cset = NSCharacterSet.lowercaseLetterCharacterSet()
let c : Character = "c"
let ok = charset(cset, containsCharacter:c) // true
like image 12
matt Avatar answered Nov 13 '22 16:11

matt


Do it all in a one liner:

validCharacterSet.contains(String(char).unicodeScalars.first!)

(Swift 3)

like image 4
Rien Avatar answered Nov 13 '22 16:11

Rien