Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create a String from UTF8 in Swift?

Tags:

We know we can print each character in UTF8 code units? Then, if we have code units of these characters, how can we create a String with them?

like image 566
jxwho Avatar asked Jun 28 '14 09:06

jxwho


People also ask

How do you make a UTF-8 string?

In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.

How do you convert something to a string in Swift?

To convert an Int value to a String value in Swift, use String(). String() accepts integer as argument and returns a String value created using the given integer value.

Are Swift strings UTF-8?

Swift 5 switches the preferred encoding of strings from UTF-16 to UTF-8 while preserving efficient Objective-C-interoperability.

Is UTF-8 a string?

UTF-8 encodes a character into a binary string of one, two, three, or four bytes. UTF-16 encodes a Unicode character into a string of either two or four bytes. This distinction is evident from their names.


2 Answers

It's possible to convert UTF8 code points to a Swift String idiomatically using the UTF8 Swift class. Although it's much easier to convert from String to UTF8!

import Foundation  public class UTF8Encoding {   public static func encode(bytes: Array<UInt8>) -> String {     var encodedString = ""     var decoder = UTF8()     var generator = bytes.generate()     var finished: Bool = false     do {       let decodingResult = decoder.decode(&generator)       switch decodingResult {       case .Result(let char):         encodedString.append(char)       case .EmptyInput:         finished = true       /* ignore errors and unexpected values */       case .Error:         finished = true       default:         finished = true       }     } while (!finished)     return encodedString   }    public static func decode(str: String) -> Array<UInt8> {     var decodedBytes = Array<UInt8>()     for b in str.utf8 {       decodedBytes.append(b)     }     return decodedBytes   } }  func testUTF8Encoding() {   let testString = "A UTF8 String With Special Characters: 😀🍎"   let decodedArray = UTF8Encoding.decode(testString)   let encodedString = UTF8Encoding.encode(decodedArray)   XCTAssert(encodedString == testString, "UTF8Encoding is lossless: \(encodedString) != \(testString)") } 

Of the other alternatives suggested:

  • Using NSString invokes the Objective-C bridge;

  • Using UnicodeScalar is error-prone because it converts UnicodeScalars directly to Characters, ignoring complex grapheme clusters; and

  • Using String.fromCString is potentially unsafe as it uses pointers.

like image 173
Tim WB Avatar answered Oct 06 '22 13:10

Tim WB


With Swift 5, you can choose one of the following ways in order to convert a collection of UTF-8 code units into a string.


#1. Using String's init(_:) initializer

If you have a String.UTF8View instance (i.e. a collection of UTF-8 code units) and want to convert it to a string, you can use init(_:) initializer. init(_:) has the following declaration:

init(_ utf8: String.UTF8View) 

Creates a string corresponding to the given sequence of UTF-8 code units.

The Playground sample code below shows how to use init(_:):

let string = "Café 🇫🇷" let utf8View: String.UTF8View = string.utf8  let newString = String(utf8View) print(newString) // prints: Café 🇫🇷 

#2. Using Swift's init(decoding:as:) initializer

init(decoding:as:) creates a string from the given Unicode code units collection in the specified encoding:

let string = "Café 🇫🇷" let codeUnits: [Unicode.UTF8.CodeUnit] = Array(string.utf8)  let newString = String(decoding: codeUnits, as: UTF8.self) print(newString) // prints: Café 🇫🇷 

Note that init(decoding:as:) also works with String.UTF8View parameter:

let string = "Café 🇫🇷" let utf8View: String.UTF8View = string.utf8  let newString = String(decoding: utf8View, as: UTF8.self) print(newString) // prints: Café 🇫🇷 

#3. Using transcode(_:from:to:stoppingOnError:into:) function

The following example transcodes the UTF-8 representation of an initial string into Unicode scalar values (UTF-32 code units) that can be used to build a new string:

let string = "Café 🇫🇷" let bytes = Array(string.utf8)  var newString = "" _ = transcode(bytes.makeIterator(), from: UTF8.self, to: UTF32.self, stoppingOnError: true, into: {     newString.append(String(Unicode.Scalar($0)!)) }) print(newString) // prints: Café 🇫🇷 

#4. Using Array's withUnsafeBufferPointer(_:) method and String's init(cString:) initializer

init(cString:) has the following declaration:

init(cString: UnsafePointer<CChar>) 

Creates a new string by copying the null-terminated UTF-8 data referenced by the given pointer.

The following example shows how to use init(cString:) with a pointer to the content of a CChar array (i.e. a well-formed UTF-8 code unit sequence) in order to create a string from it:

let bytes: [CChar] = [67, 97, 102, -61, -87, 32, -16, -97, -121, -85, -16, -97, -121, -73, 0]  let newString = bytes.withUnsafeBufferPointer({ (bufferPointer: UnsafeBufferPointer<CChar>)in     return String(cString: bufferPointer.baseAddress!) }) print(newString) // prints: Café 🇫🇷 

#5. Using Unicode.UTF8's decode(_:) method

To decode a code unit sequence, call decode(_:) repeatedly until it returns UnicodeDecodingResult.emptyInput:

let string = "Café 🇫🇷" let codeUnits = Array(string.utf8)  var codeUnitIterator = codeUnits.makeIterator() var utf8Decoder = Unicode.UTF8() var newString = ""  Decode: while true {     switch utf8Decoder.decode(&codeUnitIterator) {     case .scalarValue(let value):         newString.append(Character(Unicode.Scalar(value)))     case .emptyInput:         break Decode     case .error:         print("Decoding error")         break Decode     } }  print(newString) // prints: Café 🇫🇷 

#6. Using String's init(bytes:encoding:) initializer

Foundation gives String a init(bytes:encoding:) initializer that you can use as indicated in the Playground sample code below:

import Foundation  let string = "Café 🇫🇷" let bytes: [Unicode.UTF8.CodeUnit] = Array(string.utf8)  let newString = String(bytes: bytes, encoding: String.Encoding.utf8) print(String(describing: newString)) // prints: Optional("Café 🇫🇷") 
like image 22
Imanou Petit Avatar answered Oct 06 '22 12:10

Imanou Petit