We know we can print each character in UTF8 code units? Then, if we have code units of these characters, how can we create a String with them?
In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.
To convert an Int value to a String value in Swift, use String(). String() accepts integer as argument and returns a String value created using the given integer value.
Swift 5 switches the preferred encoding of strings from UTF-16 to UTF-8 while preserving efficient Objective-C-interoperability.
UTF-8 encodes a character into a binary string of one, two, three, or four bytes. UTF-16 encodes a Unicode character into a string of either two or four bytes. This distinction is evident from their names.
It's possible to convert UTF8 code points to a Swift String idiomatically using the UTF8
Swift class. Although it's much easier to convert from String to UTF8!
import Foundation public class UTF8Encoding { public static func encode(bytes: Array<UInt8>) -> String { var encodedString = "" var decoder = UTF8() var generator = bytes.generate() var finished: Bool = false do { let decodingResult = decoder.decode(&generator) switch decodingResult { case .Result(let char): encodedString.append(char) case .EmptyInput: finished = true /* ignore errors and unexpected values */ case .Error: finished = true default: finished = true } } while (!finished) return encodedString } public static func decode(str: String) -> Array<UInt8> { var decodedBytes = Array<UInt8>() for b in str.utf8 { decodedBytes.append(b) } return decodedBytes } } func testUTF8Encoding() { let testString = "A UTF8 String With Special Characters: 😀🍎" let decodedArray = UTF8Encoding.decode(testString) let encodedString = UTF8Encoding.encode(decodedArray) XCTAssert(encodedString == testString, "UTF8Encoding is lossless: \(encodedString) != \(testString)") }
Of the other alternatives suggested:
Using NSString
invokes the Objective-C bridge;
Using UnicodeScalar
is error-prone because it converts UnicodeScalars directly to Characters, ignoring complex grapheme clusters; and
Using String.fromCString
is potentially unsafe as it uses pointers.
With Swift 5, you can choose one of the following ways in order to convert a collection of UTF-8 code units into a string.
String
's init(_:)
initializerIf you have a String.UTF8View
instance (i.e. a collection of UTF-8 code units) and want to convert it to a string, you can use init(_:)
initializer. init(_:)
has the following declaration:
init(_ utf8: String.UTF8View)
Creates a string corresponding to the given sequence of UTF-8 code units.
The Playground sample code below shows how to use init(_:)
:
let string = "Café 🇫🇷" let utf8View: String.UTF8View = string.utf8 let newString = String(utf8View) print(newString) // prints: Café 🇫🇷
Swift
's init(decoding:as:)
initializerinit(decoding:as:)
creates a string from the given Unicode code units collection in the specified encoding:
let string = "Café 🇫🇷" let codeUnits: [Unicode.UTF8.CodeUnit] = Array(string.utf8) let newString = String(decoding: codeUnits, as: UTF8.self) print(newString) // prints: Café 🇫🇷
Note that init(decoding:as:)
also works with String.UTF8View
parameter:
let string = "Café 🇫🇷" let utf8View: String.UTF8View = string.utf8 let newString = String(decoding: utf8View, as: UTF8.self) print(newString) // prints: Café 🇫🇷
transcode(_:from:to:stoppingOnError:into:)
functionThe following example transcodes the UTF-8 representation of an initial string into Unicode scalar values (UTF-32 code units) that can be used to build a new string:
let string = "Café 🇫🇷" let bytes = Array(string.utf8) var newString = "" _ = transcode(bytes.makeIterator(), from: UTF8.self, to: UTF32.self, stoppingOnError: true, into: { newString.append(String(Unicode.Scalar($0)!)) }) print(newString) // prints: Café 🇫🇷
Array
's withUnsafeBufferPointer(_:)
method and String
's init(cString:)
initializerinit(cString:)
has the following declaration:
init(cString: UnsafePointer<CChar>)
Creates a new string by copying the null-terminated UTF-8 data referenced by the given pointer.
The following example shows how to use init(cString:)
with a pointer to the content of a CChar
array (i.e. a well-formed UTF-8 code unit sequence) in order to create a string from it:
let bytes: [CChar] = [67, 97, 102, -61, -87, 32, -16, -97, -121, -85, -16, -97, -121, -73, 0] let newString = bytes.withUnsafeBufferPointer({ (bufferPointer: UnsafeBufferPointer<CChar>)in return String(cString: bufferPointer.baseAddress!) }) print(newString) // prints: Café 🇫🇷
Unicode.UTF8
's decode(_:)
methodTo decode a code unit sequence, call decode(_:)
repeatedly until it returns UnicodeDecodingResult.emptyInput
:
let string = "Café 🇫🇷" let codeUnits = Array(string.utf8) var codeUnitIterator = codeUnits.makeIterator() var utf8Decoder = Unicode.UTF8() var newString = "" Decode: while true { switch utf8Decoder.decode(&codeUnitIterator) { case .scalarValue(let value): newString.append(Character(Unicode.Scalar(value))) case .emptyInput: break Decode case .error: print("Decoding error") break Decode } } print(newString) // prints: Café 🇫🇷
String
's init(bytes:encoding:)
initializerFoundation gives String
a init(bytes:encoding:)
initializer that you can use as indicated in the Playground sample code below:
import Foundation let string = "Café 🇫🇷" let bytes: [Unicode.UTF8.CodeUnit] = Array(string.utf8) let newString = String(bytes: bytes, encoding: String.Encoding.utf8) print(String(describing: newString)) // prints: Optional("Café 🇫🇷")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With