How to get numbers characters from a string? I don't want to convert in Int
.
var string = "string_1"
var string2 = "string_20_certified"
My result have to be formatted like this:
newString = "1"
newString2 = "20"
To find the duplicate character from the string, we count the occurrence of each character in the string. If count is greater than 1, it implies that a character has a duplicate entry in the string. In above example, the characters highlighted in green are duplicate characters.
Take a string str. Take n as integer, ch as character and length of str as integer. Function occurrences_char(string str, int length, int n, char ch) takes str, ch, n and length of str and returns the count of ch in first n characters in repeated string str.
In order to find occurence of each character in a string we can use Map utility of Java.In Map a key could not be duplicate so make each character of string as key of Map and provide initial value corresponding to each key as 1 if this character does not inserted in map before.
String
's unicode scalars against Western Arabic NumeralsYou could pattern match the unicodeScalars
view of a String
to a given UnicodeScalar
pattern (covering e.g. Western Arabic numerals).
extension String {
var westernArabicNumeralsOnly: String {
let pattern = UnicodeScalar("0")..."9"
return String(unicodeScalars
.flatMap { pattern ~= $0 ? Character($0) : nil })
}
}
Example usage:
let str1 = "string_1"
let str2 = "string_20_certified"
let str3 = "a_1_b_2_3_c34"
let newStr1 = str1.westernArabicNumeralsOnly
let newStr2 = str2.westernArabicNumeralsOnly
let newStr3 = str3.westernArabicNumeralsOnly
print(newStr1) // 1
print(newStr2) // 20
print(newStr3) // 12334
The unicode scalar pattern matching approach above is particularly useful extending it to matching any of a several given patterns, e.g. patterns describing different variations of Eastern Arabic numerals:
extension String {
var easternArabicNumeralsOnly: String {
let patterns = [UnicodeScalar("\u{0660}")..."\u{0669}", // Eastern Arabic
"\u{06F0}"..."\u{06F9}"] // Perso-Arabic variant
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
This could be used in practice e.g. if writing an Emoji filter, as ranges of unicode scalars that cover emojis can readily be added to the patterns
array in the Eastern Arabic example above.
UnicodeScalar
patterns approach over Character
ones?A Character
in Swift contains of an extended grapheme cluster, which is made up of one or more Unicode scalar values. This means that Character
instances in Swift does not have a fixed size in the memory, which means random access to a character within a collection of sequentially (/contiguously) stored character will not be available at O(1), but rather, O(n).
Unicode scalars in Swift, on the other hand, are stored in fixed sized UTF-32 code units, which should allow O(1) random access. Now, I'm not entirely sure if this is a fact, or a reason for what follows: but a fact is that if benchmarking the methods above vs equivalent method using the CharacterView
(.characters
property) for some test String
instances, its very apparent that the UnicodeScalar
approach is faster than the Character
approach; naive testing showed a factor 10-25 difference in execution times, steadily growing for growing String
size.
Knowing the limitations of working with Unicode scalars vs Characters in Swift
Now, there are drawbacks using the UnicodeScalar
approach, however; namely when working with characters that cannot represented by a single unicode scalar, but where one of its unicode scalars are contained in the pattern to which we want to match.
E.g., consider a string holding the four characters "Café"
. The last character, "é"
, is represented by two unicode scalars, "e"
and "\u{301}"
. If we were to implement pattern matching against, say, UnicodeScalar("a")...e
, the filtering method as applied above would allow one of the two unicode scalars to pass.
extension String {
var onlyLowercaseLettersAthroughE: String {
let patterns = [UnicodeScalar("1")..."e"]
return String(unicodeScalars
.flatMap { uc in patterns.contains{ $0 ~= uc } ? Character(uc) : nil })
}
}
let str = "Cafe\u{301}"
print(str) // Café
print(str.onlyLowercaseLettersAthroughE) // Cae
/* possibly we'd want "Ca" or "Caé"
as result here */
In the particular use case queried by from the OP in this Q&A, the above is not an issue, but depending on the use case, it will sometimes be more appropriate to work with Character
pattern matching over UnicodeScalar
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With