Equivalently, how can I typespec for a "single" UTF8 char?
Within a type definition, I can have generic "any string" or "any utf8 string" with
@type tile :: String.t # matches any string
@type tile :: <<_::8>> # matches any single byte
but it seems I can't match for the first bit to be 0
@type tile :: <<0::1, _::7>>
The case for single UTF bit sequence would be
@type tile :: <<0::1, _::7>> |
<<6::3, _::5, 2::2, _::6>> |
<<14::4, _::4, 2::2, _::6, 2::2, _::6>> |
<<30::5, _::3, 2::2, _::6, 2::2, _::6, 2::2, _::6>>
(these bit patterns match when using pattern matching, for instance
<<14::4, _::4, 2::2, _::6, 2::2, _::6>> = "○"
succeeds.)
But when used in typespecs, the compiler complains greatly with
== Compilation error in file lib/board.ex ==
** (ArgumentError) argument error
(elixir) lib/kernel/typespec.ex:1000: Kernel.Typespec.typespec/3
(elixir) lib/kernel/typespec.ex:1127: anonymous fn/4 in Kernel.Typespec.typespec/3
(elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
(elixir) lib/kernel/typespec.ex:1127: Kernel.Typespec.typespec/3
(elixir) lib/kernel/typespec.ex:828: anonymous fn/4 in Kernel.Typespec.typespec/3
(elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
(elixir) lib/kernel/typespec.ex:828: Kernel.Typespec.typespec/3
(elixir) lib/kernel/typespec.ex:470: Kernel.Typespec.translate_type/3
Is there any way to typespec to some bit pattern like this?
You cannot typespec on binary patterns only on sole fact of the binary. Even if you could define such specs then I do not believe that Dialyzer is sophisticated enough to find failures in such matches. You are left only with implementing such behaviour using guards and pattern matches in runtime, like:
def unicode?(<<0::size(1), a::size(7)>>), do: true
def unicode?(<<6::3, _::5, 2::2, _::6>>), do: true
def unicode?(<<14::4, _::4, 2::2, _::6, 2::2, _::6>>), do: true
def unicode?(<<30::5, _::3, 2::2, _::6, 2::2, _::6, 2::2, _::6>>), do: true
def unicode?(str) when is_binary(str), do: false
Unfortunately as far as I know there is no way to have bit patterns in guards, you can only match on whole bytes using binary_part/3
, but there is no function to do the same for bits. So the nearest you could get is something like this (untested whether this works or even compile, but give you general view on what is possible):
defguardp is_valid_utf_part(code) when code in 0b10000000..0b10111111
defguard is_unicode(<<ascii>>) when ascii in 0b0000000..0b01111111
defguard is_unicode(<<first, second>>)
when first in 0b11000000..0b11011111
and is_valid_utf_part(second)
defguard is_unicode(<<first, second, third>>)
when first in 0b11100000..0b11101111
and is_valid_utf_part(second)
and is_valid_utf_part(third)
defguard is_unicode(<<first, second, third, fourth>>)
when first in 0b11110000..0b11110111
and is_valid_utf_part(second)
and is_valid_utf_part(third)
and is_valid_utf_part(fourth)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With