Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern match on any single UTF-8 character

I would like to have a function clause that matches any single UTF-8 character.

I can match on specific characters like this

def foo("a") do
  "It's an a"
end

But I cannot determine if it possible to do the same for any single UTF8 character.

My current solution is to split the string to a char list and pattern match on that, but I was curious if I could skip that step.

like image 251
lpil Avatar asked Aug 12 '15 22:08

lpil


2 Answers

You can do this with:

def char?(<<c::utf8>>), do: true
def char?(_), do: false

Note that this only matches a binary with a single character, to match on the next character in a string, you can just do:

def char?(<<c::utf8, _rest::binary>>), do: true
like image 100
bitwalker Avatar answered Oct 22 '22 11:10

bitwalker


From the Regex docs:

The modifiers available when creating a Regex are: ...

  • unicode (u) - enables Unicode specific patterns like \p and changes modifiers like \w, \W, \s and friends to also match on Unicode. It expects valid Unicode strings to be given on match
  • dotall (s) - causes dot to match newlines and also set newline to anycrlf; the new line setting can be overridden by setting (*CR) or (*LF) or (*CRLF) or (*ANY) according to :re documentation

So you might try: ~r/./us

From http://elixir-lang.org/crash-course.html

In Elixir, the word string means a UTF-8 binary and there is a String module that works on such data

So I think you should be good to go.

like image 41
Kevin Wheeler Avatar answered Oct 22 '22 11:10

Kevin Wheeler