Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Unicode script of a Char in Haskell

I wanted to write a function checking that a Char represents a Cyrillic letter, purely for pedagogical reasons. The simple approximation for Russian is

isCyrillic c = 
    let lc = toLower c 
    in 'а' <= lc && lc <= 'я'

but I don't like it because it doesn't handle other Cyrillic-using languages. I could hardcode the ranges:

U+0400–U+04FF Cyrillic
U+0500–U+052F Cyrillic Supplement
U+2DE0–U+2DFF Cyrillic Extended-A
U+A640–U+A69F Cyrillic Extended-B
U+1C80–U+1C8F Cyrillic Extended-C

but this doesn't seem good practice either.

Ideally the function would be just

isCyrillic c = unicodeScript c == Cyrillic

but this assumes the existence of a type enumerating Unicode scripts (Unicode ranges would do as well). Is there one somewhere?

like image 426
Alexey Romanov Avatar asked Mar 06 '23 17:03

Alexey Romanov


1 Answers

property from text-icu's Data.Text.ICU.Char seems to fit the bill:

import Data.Text.ICU.Char

isCyrilic c = property Block c == Cyrillic
like image 159
duplode Avatar answered Mar 16 '23 23:03

duplode