Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check whether grapheme is a letter?

How do I check whether grapheme is a letter (or something that is often used in words, like hieroglyph)?

After looking through Elixir's String documentation the only way I see is to check whether String.downcase and String.upcase return the same string. Iff they do, then the grapheme is not something that is used in words.

This is how I do it, but surely there should be a simpler way?

defmodule Words do
  defp all_letters_uppercase?(string) do
    String.upcase(string) == string
  end

  defp all_letters_downcase?(string) do
    String.downcase(string) == string
  end

  defp contains_letter?(string) do
    not (all_letters_uppercase?(string) and all_letters_downcase?(string))
  end

  def single_grapheme?(string) do
    with graphemes = String.graphemes(string)
    do
      length(graphemes) == 1 and hd(graphemes) == string
    end
  end

  @doc """
  Check whether string is a single letter.
  """
  def letter?(string) do
    single_grapheme?(string) and contains_letter?(string)
  end
end

Update: my code doesn't work for japanese letters

iex(35)> Words.letter?("グ")            
false
like image 621
CrabMan Avatar asked Feb 07 '23 06:02

CrabMan


1 Answers

You can use regular expressions to check for some unicode features, one of which is \p{Letter}, or \p{L} for short. You might want to add a \p{Mark}*, or \p{M}* to also match multiple following combining diacritics. This would closely match the logic found in String.graphemes/1. Be sure to add the u modifier after the regex to enable these Unicode features. For example:

iex> String.match?("グ", ~r/\A\p{L}\p{M}*\z/u)
true

Also see http://erlang.org/doc/man/re.html, section on "Unicode character properties" and http://www.regular-expressions.info/unicode.html#grapheme.

like image 193
Patrick Oscity Avatar answered Feb 27 '23 06:02

Patrick Oscity