Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NSArray from NSCharacterSet

Currently I am able to make array of Alphabets like below

[[NSArray alloc]initWithObjects:@"A",@"B",@"C",@"D",@"E",@"F",@"G",@"H",@"I",@"J",@"K",@"L",@"M",@"N",@"O",@"P",@"Q",@"R",@"S",@"T",@"U",@"V",@"W",@"X",@"Y",@"Z",nil]; 

Knowing that is available over

[NSCharacterSet uppercaseLetterCharacterSet] 

How to make an array out of it?

like image 267
Saran Avatar asked Apr 01 '13 10:04

Saran


2 Answers

The following code creates an array containing all characters of a given character set. It works also for characters outside of the "basic multilingual plane" (characters > U+FFFF, e.g. U+10400 DESERET CAPITAL LETTER LONG I).

NSCharacterSet *charset = [NSCharacterSet uppercaseLetterCharacterSet]; NSMutableArray *array = [NSMutableArray array]; for (int plane = 0; plane <= 16; plane++) {     if ([charset hasMemberInPlane:plane]) {         UTF32Char c;         for (c = plane << 16; c < (plane+1) << 16; c++) {             if ([charset longCharacterIsMember:c]) {                 UTF32Char c1 = OSSwapHostToLittleInt32(c); // To make it byte-order safe                 NSString *s = [[NSString alloc] initWithBytes:&c1 length:4 encoding:NSUTF32LittleEndianStringEncoding];                 [array addObject:s];             }         }     } } 

For the uppercaseLetterCharacterSet this gives an array of 1467 elements. But note that characters > U+FFFF are stored as UTF-16 surrogate pair in NSString, so for example U+10400 actually is stored in NSString as 2 characters "\uD801\uDC00".

Swift 2 code can be found in other answers to this question. Here is a Swift 3 version, written as an extension method:

extension CharacterSet {     func allCharacters() -> [Character] {         var result: [Character] = []         for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {             for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {                 if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {                     result.append(Character(uniChar))                 }             }         }         return result     } } 

Example:

let charset = CharacterSet.uppercaseLetters let chars = charset.allCharacters() print(chars.count) // 1521 print(chars) // ["A", "B", "C", ... "] 

(Note that some characters may not be present in the font used to display the result.)

like image 179
Martin R Avatar answered Sep 18 '22 08:09

Martin R


Inspired by Satachito answer, here is a performant way to make an Array from CharacterSet using bitmapRepresentation:

extension CharacterSet {     func characters() -> [Character] {         // A Unicode scalar is any Unicode code point in the range U+0000 to U+D7FF inclusive or U+E000 to U+10FFFF inclusive.         return codePoints().compactMap { UnicodeScalar($0) }.map { Character($0) }     }          func codePoints() -> [Int] {         var result: [Int] = []         var plane = 0         // following documentation at https://developer.apple.com/documentation/foundation/nscharacterset/1417719-bitmaprepresentation         for (i, w) in bitmapRepresentation.enumerated() {             let k = i % 0x2001             if k == 0x2000 {                 // plane index byte                 plane = Int(w) << 13                 continue             }             let base = (plane + k) << 3             for j in 0 ..< 8 where w & 1 << j != 0 {                 result.append(base + j)             }         }         return result     } } 

Example for uppercaseLetters

let charset = CharacterSet.uppercaseLetters let chars = charset.characters() print(chars.count) // 1733 print(chars) // ["A", "B", "C", ... "] 

Example for discontinuous planes

let charset = CharacterSet(charactersIn: "𝚨󌞑") let codePoints = charset.codePoints() print(codePoints) // [120488, 837521] 

Performances

Very good depending on the data/usage: this solution built in release with bitmapRepresentation seems 2 to 10 times faster than Martin R's solution with contains or Oliver Atkinson's solution with longCharacterIsMember.

Be sure to compare depending on your own needs: performances are best compared in a non-debug build; so avoid comparing performances in a Playground.

like image 43
Cœur Avatar answered Sep 18 '22 08:09

Cœur