Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one avoid creating an ad-hoc type system in dynamically typed languages?

In every project I've started in languages without type systems, I eventually begin to invent a runtime type system. Maybe the term "type system" is too strong; at the very least, I create a set of type/value-range validators when I'm working with complex data types, and then I feel the need to be paranoid about where data types can be created and modified.

I hadn't thought twice about it until now. As an independent developer, my methods have been working in practice on a number of small projects, and there's no reason they'd stop working now.

Nonetheless, this must be wrong. I feel as if I'm not using dynamically-typed languages "correctly". If I must invent a type system and enforce it myself, I may as well use a language that has types to begin with.

So, my questions are:

  • Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
  • Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?

Here is a concrete example for you to consider. I'm working with datetimes and timezones in erlang (a dynamic, strongly typed language). This is a common datatype I work with:

{{Y,M,D},{tztime, {time, HH,MM,SS}, Flag}}

... where {Y,M,D} is a tuple representing a valid date (all entries are integers), tztime and time are atoms, HH,MM,SS are integers representing a sane 24-hr time, and Flag is one of the atoms u,d,z,s,w.

This datatype is commonly parsed from input, so to ensure valid input and a correct parser, the values need to be checked for type correctness, and for valid ranges. Later on, instances of this datatype are compared to each other, making the type of their values all the more important, since all terms compare. From the erlang reference manual

number < atom < reference < fun < port < pid < tuple < list < bit string
like image 207
drfloob Avatar asked Dec 16 '10 02:12

drfloob


People also ask

Why do dynamically typed programming languages tend to be type safe?

By this definition, most higher-level languages, including dynamically typed languages, are type safe, because any attempt to use a type incorrectly is guaranteed to cause an error (compile-time or run-time) in them.

What does it mean if a programming language is dynamically typed?

Dynamically-typed languages are those (like JavaScript) where the interpreter assigns variables a type at runtime based on the variable's value at the time.

What are some drawbacks to dynamic type checking languages?

In contrast to static type checking, dynamic type checking may cause a program to fail at runtime due to type errors. In some programming languages, it is possible to anticipate and recover from these failures – either by error handling or poor type safety. In others, type checking errors are considered fatal.

What is the difference between a dynamically typed language and a statically typed language?

There are two main differences between dynamic typing and static typing that you should be aware of when writing transformation scripts. First, dynamically-typed languages perform type checking at runtime, while statically typed languages perform type checking at compile time.


2 Answers

Aside from the confsion of static vs. dynamic and strong vs. weak typing:

What you want to implement in your example isn't really solved by most existing static typing systems. Range checks and complications like February 31th and especially parsed input are usually checked during runtime no matter what type system you have.

Your example being in Erlang I have a few recommendations:

  • Use records. Besides being usefull and helpfull for a whole bunch of reasons, the give you easy runtime type checking without a lot of effort e.g.:

    is_same_day(#datetime{year=Y1, month=M1, day=D1}, 
                #datetime{year=Y2, month=M2, day=D2}) -> ...
    

    Effortless only matches for two datetime records. You could even add guards to check for ranges if the source is untrusted. And it conforms to erlangs let it crash method of error handling: if no match is found you get a badmatch, and can handle this on the level where it is apropriate (usually the supervisor level).

  • Generally write your code that it crashes when the assumptions are not valid

  • If this doesn't feel static checked enough: use typer and dialyzer to find the kind of errors that can be found statically, whatever remains will be checkd at runtime.

  • Don't be too restrictive in your functions what "types" you accept, sometimes the added functionality of just doing someting useful even for different inputs is worth more than checking the types and ranges on every function. If you do it where it matters usually you will catch the error early enough for it to be easy fixable. This is especially true for a functionaly language where you allways know where every value comes from.

like image 105
Peer Stritzinger Avatar answered Sep 17 '22 12:09

Peer Stritzinger


A lot of good answers, let me add:

Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?

The most important paradigm, especially in Erlang, is this: Assume the type is right, otherwise let it crash. Don't write excessively checking paranoid code, but assume that the input you get is of the right type or the right pattern. Don't write (there are exceptions to this rule, but in general)

foo({tag, ...}) -> do_something(..);
foo({tag2, ...}) -> do_something_else(..);
foo(Otherwise)  ->
    report_error(Otherwise),
    try to fix problem here...

Kill the last clause and have it crash right away. Let a supervisor and other processes do the cleanup (you can use monitors() for janitorial processes to know when a crash has occurred).

Do be precise however. Write

bar(N) when is_integer(N) -> ...

baz([]) -> ...
baz(L) when is_list(L) -> ...

if the function is known only to work with integers or lists respectively. Yes, it is a runtime check but the goal is to convey information to the programmer. Also, HiPE tend to utilize the hint for optimization and eliminate the type check if possible. Hence, the price may be less than what you think it is.

You choose an untyped/dynamically-typed language so the price you have to pay is that type checking and errors from clashes will happen at runtime. As other posts hint, a statically typed language is not exempt from doing some checks as well - the type system is (usually) an approximation of a proof of correctness. In most static languages you often get input which you can't trust. This input is transformed at the "border" of the application and then converted to an internal format. The conversion serves to mark trust: From now on, the thing has been validated and we can assume certain things about it. The power and correctness of this assumption is directly tied to its type signature and how good the programmer is with juggling the static types of the language.

Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?

Erlang has the dialyzer which can be used to statically analyze and infer types of your programs. It will not come up with as many type errors as a type checker in e.g., Ocaml, but it won't "cry wolf" either: An error from the dialyzer is provably an error in the program. And it won't reject a program which may be working ok. A simple example is:

and(true, true) -> true;
and(true, _)    -> false;
and(false, _)   -> false.

The invocation and(true, greatmistake) will return false, yet a static type system will reject the program because it will infer from the first line that the type signature takes a boolean() value as the 2nd parameter. The dialyzer will accept this function in contrast and give it the signature (boolean(), term()) -> boolean(). It can do this, because there is no need to protect a priori for an error. If there is a mistake, the runtime system has a type check that will capture it.

like image 42
I GIVE CRAP ANSWERS Avatar answered Sep 21 '22 12:09

I GIVE CRAP ANSWERS