Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if string latin or cyrillic

Tags:

ios

swift

Is it some way to check if some string latin or cyrillic? I've tried localizedCompare String method, but it don't gave me needed result.

like image 452
Ookey Avatar asked Aug 02 '16 13:08

Ookey


2 Answers

What about something like this?

extension String {
    var isLatin: Bool {
        let upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        let lower = "abcdefghijklmnopqrstuvwxyz"

        for c in self.characters.map({ String($0) }) {
            if !upper.containsString(c) && !lower.containsString(c) {
                return false
            }
        }

        return true
    }

    var isCyrillic: Bool {
        let upper = "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЮЯ"
        let lower = "абвгдежзийклмнопрстуфхцчшщьюя"

        for c in self.characters.map({ String($0) }) {
            if !upper.containsString(c) && !lower.containsString(c) {
                return false
            }
        }

        return true
    }

    var isBothLatinAndCyrillic: Bool {
        return self.isLatin && self.isCyrillic
    }
}

Usage:

let s = "Hello"
if s.isLatin && !s.isBothLatinAndCyrillic {
    // String is latin
} else if s.isCyrillic && !s.isBothLatinAndCyrillic {
    // String is cyrillic
} else if s.isBothLatinAndCyrillic {
    // String can be either latin or cyrillic
} else {
    // String is not latin nor cyrillic
}

Considere there are cases where the given string could be both, for example the string:

let s = "A"

Can be both latin or cyrillic. So that's why there's the function "is both".

And it can also be none of them:

let s = "*"
like image 198
Vladimir Nul Avatar answered Oct 16 '22 22:10

Vladimir Nul


You should get all unicode characters and detect if contains cyrillic chars or Latin char based on the unicode value. This code is not complet, you can complete it.

let a : String = "ӿ" //unicode value = 04FF
let scalars = a.unicodeScalars

//get unicode value of first char:
let unicodeValue = scalars[scalars.startIndex].value  //print 1279, correspondant to 04FF.

Check here for all unicode value (in hexa). http://jrgraphix.net/r/Unicode/0400-04FF

According to this site, cyrillic value are from 0400 -> 04FF (1024 -> 1279)

this is the code for cyrillic check:

var isCyrillic = true
for (index, unicode) in scalars.enumerate() {
    if (unicode.value < 1024 || unicode.value > 1279) {
        print("not a cyrillic text")
        print(unicode.value)
        isCyrillic = false
        break
    }
}
like image 3
Duyen-Hoa Avatar answered Oct 16 '22 20:10

Duyen-Hoa