What kinds of type errors can Haskell catch at compile time that Java cannot? [closed]

Tags:

I'm just starting to learn Haskell and keep seeing references to its powerful type system. I see many instances in which the inference is much more powerful than Javas, but also the implication that it can catch more errors at compile time because of its superior type system. So, I'm wondering if it would be possible to explain what types of errors Haskell can catch at compile time that Java cannot.

980

asked Aug 19 '14 01:08

pondermatic

1 Answers

Saying that Haskell's type system can catch more errors than Java's is a little bit misleading. Let's unpack this a little bit.

Statically Typed

Java and Haskell are both statically typed languages. By this I mean that they type of a given expression in the language is known at compile time. This has a number of advantages, for both Java and Haskell, namely it allows the compiler to check that the expressions are "sane", for some reasonable definition of sane.

Yes, Java allows certain "mixed type" expressions, like "abc" + 2, which some may argue is unsafe or bad, but that is a subjective choice. In the end it is just a feature that the Java language offers, for better or worse.

Immutability

To see how Haskell code could be argued to be less error prone than Java (or C, C++, etc.) code, you must consider the type system with respect to the immutability of the language. In pure (normal) Haskell code, there are no side effects. That is to say, no value in the program, once created, may ever change. When we compute something we are creating a new result from the old result, but we don't modify the old value. This, as it turns out, has some really convenient consequences from a safety perspective. When we write code, we can be sure nothing else anywhere in the program is going to effect our function. Side effects, as it turns out, are the cause of many programming errors. An example would be a shared pointer in C that is freed in one function and then accessed in another, causing a crash. Or a variable that is set to null in Java,

String foo = "bar"; foo = null; Char c = foo.charAt(0); # Error!

This could not happen in normal Haskell code, because foo once defined, can not change. Which means it can not be set to null.

Enter the Type System

Now, you are probably wondering how the type system plays into all of this, that is what you asked about after all. Well, as nice as immutability is, it turns out there is very little interesting work that you can do without any mutation. Reading from a file? Mutation. Writing to disk? Mutation. Talking to a web server? Mutation. So what do we do? In order to solve this issue, Haskell uses its type system to encapsulate mutation in a type, called the IO Monad. For instance to read from a file, this function may be used,

readFile :: FilePath -> IO String

The IO Monad

Notice that the type of the result is not a String, it is an IO String. What this means, in laymans terms, is that the result introduces IO (side effects) to the program. In a well formed program IO will only take place inside the IO monad, thus allowing us to see very clearly, where side effects can occur. This property is enforced by the type system. Further IO a types can only produce their results, which are side effects, inside the main function of the program. So now we have very neatly and nicely isolated off the dangerous side effects to a controlled part of the program. When you get the result of the IO String, anything could happen, but at least this can't happen anywhere, only in the main function and only as the result of IO a types.

Now to be clear, you can create IO a values anywhere in your code. You can even manipulate them outside the main function, but none of that manipulation will actually take place until the result is demanded in the body of the main function. For instance,

strReplicate :: IO String strReplicate =   readFile "somefile that doesn't exist" >>= return . concat . replicate 2

This function reads input from a file, duplicates that input and appends the duplicated input onto the end of the original input. So if the file had the characters abc this would create a String with the contents abcabc. You can call this function anywhere in your code, but Haskell will only actually try to read the file when expression is found in the main function, because it is an instance of the IO Monad. Like so,

main :: IO () main =   strReplicate >>=   putStrLn

This will almost surely fail, as the file you requested probably doesn't exist, but it will only fail here. You only have to worry about side effects, not everywhere in your code, as you do in many other languages.

There is a lot more to both IO and Monads in general than I have covered here, but that is probably beyond the scope of your question.

Type Inference

Now there is one more aspect to this. Type Inference

Haskell uses a very advanced Type Inference System, that allows for you to write code that is statically typed without having to write the type annotation, such as String foo in Java. GHC can infer the type of almost any expression, even very complex ones.

What this means for our safety discussion is that everywhere an instance of IO a is used in the program, the type system will make sure that it can't be used to produce an unexpected side effect. You can't cast it to a String, and just get the result out where/when ever you want. You must explicitly introduce the side effect in the main function.

The Safety of Static Typing with the Ease of Dynamic Typing

The Type inference system has some other nice properties as well. Often people enjoy scripting languages because they don't have to write all that boilerplate for the types like they would have to do in Java or C. This is because scripting languages are dynamically typed or the type of the expression is only computed as the expression is being run by the interpreter. This makes these languages arguably more prone to errors, because you won't know if you have a bad expression until you run the code. For example, you might say something like this in Python.

def foo(x,y):   return x + y

The problem with this is that x and y can be anything. So this would be fine,

foo(1,2) -> 3

But this would cause an error,

foo(1,[]) -> Error

And we have now way of checking that this is invalid, until it is run.

It is very important to understand that all statically type languages do not have this problem, Java included. Haskell is not safer than Java in this sense. Haskell and Java both keep you safe from this type of error, but in Haskell you don't have to write all the types in order to be safe, they type system can infer the types. In general, it is considered good practice to annotate the types for your functions in Haskell, even though you don't have to. In the body of the function however, you rarely have to specify types (there are some strange edge cases where you will).

Conclusion

Hopefully that helps illuminate how Haskell keeps you safe. And in regard to Java, you might say that in Java you have to work against the type system to write code, but in Haskell the type system works for you.

197

answered Nov 09 '22 22:11

isomarcte

Related questions
                            
                                Cast int to pointer - why cast to long first? (as in p = (void*) 42; )
                            
                                How can I tell Json.NET to ignore properties in a 3rd-party object?
                            
                                enforce arguments to a specific list of values
                            
                                Playgrounds for Objective-C
                            
                                Running Android emulator on computer with AMD processor
                            
                                Why can I remove ExtensionlessUrlHandler from an MVC application without any ill effects?
                            
                                Gulp doesn't copy all files as expected
                            
                                constexpr function parameters as template arguments
                            
                                Is this GCC optimization incorrect?
                            
                                What the command "hadoop namenode -format" will do
                            
                                Git checkout -b, branch already exists
                            
                                Is multiple inheritance from the same base class via different parent classes really an issue here?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With