I want to write a function that could be used like this:
let ๐ฉโ๐ฉโ๐งโ๐ฆ = "๐ฉโ๐ฉโ๐งโ๐ง".replacingFirstOccurrence(of: "๐ง", with: "๐ฆ")
Given how odd both this string and Swift's String
library are, is this possible in Swift?
An extended grapheme cluster is a group of one or more Unicode scalar values that approximates a single user-perceived character. Many individual characters, such as โรฉโ, โ๊นโ, and โ๐ฎ๐ณโ, can be made up of multiple Unicode scalar values.
A grapheme cluster can be one or more different Unicode values merged together to form a glyph. A character in Swift is a grapheme cluster, not a Unicode value. And the same cluster can be represented in different ways. This is called canonical equivalence.
Based on the insights gained at Why are emoji characters like ๐ฉโ๐ฉโ๐งโ๐ฆ treated so strangely in Swift strings?, a sensible approach might be to replace Unicode scalars:
extension String {
func replacingFirstOccurrence(of target: UnicodeScalar, with replacement: UnicodeScalar) -> String {
let uc = self.unicodeScalars
guard let idx = uc.index(of: target) else { return self }
let prefix = uc[uc.startIndex..<idx]
let suffix = uc[uc.index(after: idx) ..< uc.endIndex]
return "\(prefix)\(replacement)\(suffix)"
}
}
Example:
let family1 = "๐ฉโ๐ฉโ๐งโ๐ฆ"
print(family1.characters.map { Array(String($0).unicodeScalars) })
// [["\u{0001F469}", "\u{200D}"], ["\u{0001F469}", "\u{200D}"], ["\u{0001F467}", "\u{200D}"], ["\u{0001F466}"]]
let family2 = family1.replacingFirstOccurrence(of: "๐ง", with: "๐ฆ")
print(family2) // ๐ฉโ๐ฉโ๐ฆโ๐ฆ
print(family2.characters.map { Array(String($0).unicodeScalars) })
// [["\u{0001F469}", "\u{200D}"], ["\u{0001F469}", "\u{200D}"], ["\u{0001F466}", "\u{200D}"], ["\u{0001F466}"]]
And here is a possible version which locates and replaces the Unicode scalars of an arbitrary string:
extension String {
func replacingFirstOccurrence(of target: String, with replacement: String) -> String {
let uc = self.unicodeScalars
let tuc = target.unicodeScalars
// Target empty or too long:
if tuc.count == 0 || tuc.count > uc.count {
return self
}
// Current search position:
var pos = uc.startIndex
// Last possible position of `tuc` within `uc`:
let end = uc.index(uc.endIndex, offsetBy: tuc.count - 1)
// Locate first Unicode scalar
while let from = uc[pos..<end].index(of: tuc.first!) {
// Compare all Unicode scalars:
let to = uc.index(from, offsetBy: tuc.count)
if !zip(uc[from..<to], tuc).contains(where: { $0 != $1 }) {
let prefix = uc[uc.startIndex..<from]
let suffix = uc[to ..< uc.endIndex]
return "\(prefix)\(replacement)\(suffix)"
}
// Next search position:
uc.formIndex(after: &pos)
}
// Target not found.
return self
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With