I need to detect if a codepoint is an upper case letter in Elixir. I have tried checking if it's value is in the range 65..90
but this fails on non-latin upper case letters. I have also tried checking if
String.upcase(cp) == cp
however this fails on non-letters (ie numbers, punctuation).
I really don't want to go through the entirety of unicode and create a list of upper case codepoints, is there a built in function for this?
You can use the \p{Lu}
Unicode character property regex escape sequence to match any uppercase letter:
iex(1)> "a" =~ ~r/^\p{Lu}$/u
false
iex(2)> "A" =~ ~r/^\p{Lu}$/u
true
iex(3)> "π" =~ ~r/^\p{Lu}$/u
false
iex(4)> "Π" =~ ~r/^\p{Lu}$/u
true
iex(5)> "!" =~ ~r/^\p{Lu}$/u
false
Make sure you pass the u
flag to turn on Unicode matching in the regex.
You can find more information about the supported properties on this page. Search for the heading "Unicode character properties" on the page.
I think you could use something like this:
<< *CODEPOINT* :: utf8 >> != String.downcase(<< *CODEPOINT* :: utf8 >>)
there is maybe a better way but that's the start.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With