I'm just starting to learn Haskell and keep seeing references to its powerful type system. I see many instances in which the inference is much more powerful than Javas, but also the implication that it can catch more errors at compile time because of its superior type system. So, I'm wondering if it would be possible to explain what types of errors Haskell can catch at compile time that Java cannot.
What is TypeError? Custom type errors mechanism allows Haskell developers to introduce their own compile-time error messages about usages of their functions without a need to fork GHC and patch it for the particular use cases. It provides a user-level way for extending the capabilities of the compiler.
So no, Haskell types do not exist at runtime, in any form.
There are three different ways exceptions can be thrown in Haskell: Synchronously thrown: an exception is generated from IO code and thrown inside a single thread. Asynchronously thrown: an exception is thrown from one thread to another thread to cause it to terminate early.
Saying that Haskell's type system can catch more errors than Java's is a little bit misleading. Let's unpack this a little bit.
Java and Haskell are both statically typed languages. By this I mean that they type of a given expression in the language is known at compile time. This has a number of advantages, for both Java and Haskell, namely it allows the compiler to check that the expressions are "sane", for some reasonable definition of sane.
Yes, Java allows certain "mixed type" expressions, like "abc" + 2
, which some may argue is unsafe or bad, but that is a subjective choice. In the end it is just a feature that the Java language offers, for better or worse.
To see how Haskell code could be argued to be less error prone than Java (or C, C++, etc.) code, you must consider the type system with respect to the immutability of the language. In pure (normal) Haskell code, there are no side effects. That is to say, no value in the program, once created, may ever change. When we compute something we are creating a new result from the old result, but we don't modify the old value. This, as it turns out, has some really convenient consequences from a safety perspective. When we write code, we can be sure nothing else anywhere in the program is going to effect our function. Side effects, as it turns out, are the cause of many programming errors. An example would be a shared pointer in C that is freed in one function and then accessed in another, causing a crash. Or a variable that is set to null in Java,
String foo = "bar"; foo = null; Char c = foo.charAt(0); # Error!
This could not happen in normal Haskell code, because foo
once defined, can not change. Which means it can not be set to null
.
Now, you are probably wondering how the type system plays into all of this, that is what you asked about after all. Well, as nice as immutability is, it turns out there is very little interesting work that you can do without any mutation. Reading from a file? Mutation. Writing to disk? Mutation. Talking to a web server? Mutation. So what do we do? In order to solve this issue, Haskell uses its type system to encapsulate mutation in a type, called the IO Monad. For instance to read from a file, this function may be used,
readFile :: FilePath -> IO String
Notice that the type of the result is not a String
, it is an IO String
. What this means, in laymans terms, is that the result introduces IO (side effects) to the program. In a well formed program IO will only take place inside the IO monad, thus allowing us to see very clearly, where side effects can occur. This property is enforced by the type system. Further IO a
types can only produce their results, which are side effects, inside the main
function of the program. So now we have very neatly and nicely isolated off the dangerous side effects to a controlled part of the program. When you get the result of the IO String
, anything could happen, but at least this can't happen anywhere, only in the main
function and only as the result of IO a
types.
Now to be clear, you can create IO a
values anywhere in your code. You can even manipulate them outside the main
function, but none of that manipulation will actually take place until the result is demanded in the body of the main
function. For instance,
strReplicate :: IO String strReplicate = readFile "somefile that doesn't exist" >>= return . concat . replicate 2
This function reads input from a file, duplicates that input and appends the duplicated input onto the end of the original input. So if the file had the characters abc
this would create a String
with the contents abcabc
. You can call this function anywhere in your code, but Haskell will only actually try to read the file when expression is found in the main
function, because it is an instance of the IO
Monad
. Like so,
main :: IO () main = strReplicate >>= putStrLn
This will almost surely fail, as the file you requested probably doesn't exist, but it will only fail here. You only have to worry about side effects, not everywhere in your code, as you do in many other languages.
There is a lot more to both IO and Monads in general than I have covered here, but that is probably beyond the scope of your question.
Now there is one more aspect to this. Type Inference
Haskell uses a very advanced Type Inference System, that allows for you to write code that is statically typed without having to write the type annotation, such as String foo
in Java. GHC can infer the type of almost any expression, even very complex ones.
What this means for our safety discussion is that everywhere an instance of IO a
is used in the program, the type system will make sure that it can't be used to produce an unexpected side effect. You can't cast it to a String
, and just get the result out where/when ever you want. You must explicitly introduce the side effect in the main
function.
The Type inference system has some other nice properties as well. Often people enjoy scripting languages because they don't have to write all that boilerplate for the types like they would have to do in Java or C. This is because scripting languages are dynamically typed or the type of the expression is only computed as the expression is being run by the interpreter. This makes these languages arguably more prone to errors, because you won't know if you have a bad expression until you run the code. For example, you might say something like this in Python.
def foo(x,y): return x + y
The problem with this is that x
and y
can be anything. So this would be fine,
foo(1,2) -> 3
But this would cause an error,
foo(1,[]) -> Error
And we have now way of checking that this is invalid, until it is run.
It is very important to understand that all statically type languages do not have this problem, Java included. Haskell is not safer than Java in this sense. Haskell and Java both keep you safe from this type of error, but in Haskell you don't have to write all the types in order to be safe, they type system can infer the types. In general, it is considered good practice to annotate the types for your functions in Haskell, even though you don't have to. In the body of the function however, you rarely have to specify types (there are some strange edge cases where you will).
Hopefully that helps illuminate how Haskell keeps you safe. And in regard to Java, you might say that in Java you have to work against the type system to write code, but in Haskell the type system works for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With