Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check whether input is a string in Erlang?

Tags:

erlang

I would like to write a function to check if the input is a string or not like this:

is_string(Input) ->
  case check_if_string(Input) of
    true  -> {ok, Input};
    false -> error
  end.

But I found it is tricky to check whether the input is a string in Erlang. The string definition in Erlang is here: http://erlang.org/doc/man/string.html.

Any suggestions?

Thanks in advance.

like image 506
wufucious Avatar asked Jan 29 '23 13:01

wufucious


1 Answers

In Erlang a string can be actually quite a few things, so there are a few ways to do this depending on exactly what you mean by "a string". It is worth bearing in mind that every sort of string in Erlang is a list of character or lexeme values of some sort.

Encodings are not simple things, particularly when Unicode is involved. Characters can be almost arbitrarily high values, lexemes are globbed together in deep lists of integers, and Erlang iolist()s (which are super useful) are deep lists of mixed integer and binary values that get automatically flattened and converted during certain operations. If you are dealing with anything other than flat lists of printable ASCII values then I strongly recommend you read these:

  • Unicode module docs
  • String module docs
  • IO Library module docs

So... this is not a very simple question.

What to do about all the confusion?

Quick answer that always works: Consider the origin of the data.

You should know what kind of data you are dealing with, whether it is coming over a socket or from a file, or especially if you are generating it yourself. On the edges of your system you may need some help purifying data, though, because network clients send all sorts of random trash from time to time.

Some helper functions for the most common cases live in the io_lib module:

  • io_lib:char_list/1: Returns true if the input is a list of characters in the unicode range.
  • io_lib:deep_char_list/1: Returns true if the input is a deep list of legal chars.
  • io_lib:deep_latin1_char_list/1: Returns true if the input is a deep list of Latin-1 (your basic printable ASCII values from 32 to 126).
  • io_lib:latin1_char_list/1: Returns true if the input is a flat list of Latin-1 characters (90% of the time this is what you're looking for)
  • io_lib:printable_latin1_list/1: Returns true if the input is a list of printable Latin-1 (If the above isn't what you wanted, 9% of the time this is the one you want)
  • io_lib:printable_list/1: Returns true if the input is a flat list of printable chars.
  • io_lib:printable_unicode_list/1: Returns true if the input is a flat list of printable unicode chars (for that 1% of the time that this is your problem -- except that for some of us, myself included here in Japan, this covers 99% of my input checking cases).

For more particular cases you can either use a regex from the re module or write your own recursive function that zips through a string for those special cases where a regex either doesn't fit, is impossible, or could make you vulnerable to regex attacks.

like image 71
zxq9 Avatar answered Feb 11 '23 04:02

zxq9