Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I avoid constructing in Haskell?

Tags:

While reading a snipped from Haskell for Great Good I found the following situation:

treeInsert :: (Ord a) => a -> Tree a -> Tree a   treeInsert x EmptyTree = singleton x   treeInsert x (Node a left right)      | x == a = Node x left right     | x < a  = Node a (treeInsert x left) right     | x > a  = Node a left (treeInsert x right) 

Wouldn't it be better for performance if we just reused the given Tree when x == a?

treeInsert :: (Ord a) => a -> Tree a -> Tree a   treeInsert x EmptyTree = singleton x   treeInsert x all@(Node a left right)      | x == a = all     | x < a  = Node a (treeInsert x left) right     | otherwise  = Node a left (treeInsert x right) 

In real life coding, what should I do? Are there any drawbacks when returning the same thing?

like image 437
FtheBuilder Avatar asked Oct 03 '15 12:10

FtheBuilder


People also ask

Is Haskell IO pure?

Haskell is a pure language and even the I/O system can't break this purity. Being pure means that the result of any function call is fully determined by its arguments. Procedural entities like rand() or getchar() in C, which return different results on each call, are simply impossible to write in Haskell.

What is a Haskell constructor?

Data constructors are first class values in Haskell and actually have a type. For instance, the type of the Left constructor of the Either data type is: Left :: a -> Either a b. As first class values, they may be passed to functions, held in a list, be data elements of other algebraic data types and so forth.

When should I use Haskell?

Definition on Haskell Where Function. Haskell where is not a function rather it is a keyword that is used to divide the more complex logic or calculation into smaller parts, which makes the logic or calculation easy to understand and handle.


2 Answers

Let's look at the core! (Without optimisations here)

$ ghc-7.8.2 -ddump-simpl wtmpf-file13495.hs

The relevant difference is that the first version (without all@(...)) has

                case GHC.Classes.> @ a_aUH $dOrd_aUV eta_B2 a1_aBQ                 of _ [Occ=Dead] {                   GHC.Types.False ->                     Control.Exception.Base.patError                       @ (TreeInsert.Tree a_aUH)                       "wtmpf-file13495.hs:(9,1)-(13,45)|function treeInsert"#;                   GHC.Types.True ->                     TreeInsert.Node                       @ a_aUH                       a1_aBQ                       left_aBR                       (TreeInsert.treeInsert @ a_aUH $dOrd_aUV eta_B2 right_aBS) 

where reusing the node with that as-pattern does just

                TreeInsert.Node                   @ a_aUI                   a1_aBR                   left_aBS                   (TreeInsert.treeInsert @ a_aUI $dOrd_aUW eta_B2 right_aBT); 

This is an avoided check that may well make a significant performance difference.

However, this difference has actually nothing to do with the as-pattern. It's just because your first snippet uses a x > a guard, which is not trivial. The second uses otherwise, which is optimised away.

If you change the first snippet to

treeInsert :: (Ord a) => a -> Tree a -> Tree a   treeInsert x EmptyTree = singleton x   treeInsert x (Node a left right)      | x == a     = Node x left right     | x < a      = Node a (treeInsert x left) right     | otherwise  = Node a left (treeInsert x right) 

then the difference boils down to

      GHC.Types.True -> TreeInsert.Node @ a_aUH a1_aBQ left_aBR right_aBS 

vs

      GHC.Types.True -> wild_Xa 

Which is indeed just the difference of Node x left right vs all.

...without optimisations, that is. The versions diverge further when I turn on -O2. But I can't really make out how the performance would differ, there.

like image 193
leftaroundabout Avatar answered Sep 21 '22 14:09

leftaroundabout


In real life coding, what should I do? Are there any drawbacks when returning the same thing?

a == b does not guarantee that f a == f b for all functions f. So, you may have to return new object even if they compare equal.

In other words, it may not be feasible to change Node x left right to Node a left right or all when a == x regardless of performance gains.

For example you may have types which carry meta data. When you compare them for equality, you may only care about the values and ignore the meta data. But if you replace them just because they compare equal then you will loose the meta data.

newtype ValMeta a b = ValMeta (a, b)  -- value, along with meta data     deriving (Show)  instance Eq a => Eq (ValMeta a b) where     -- equality only compares values, ignores meta data     ValMeta (a, b) == ValMeta (a', b') = a == a' 

The point is Eq type-class only says that you may compare values for equality. It does not guarantee anything beyond that.

A real-world example where a == b does not guarantee that f a == f b is when you maintain a Set of unique values within a self-balancing tree. A self-balancing tree (such as Red-Black tree) has some guarantees about the structure of tree but the actual depth and structure depends on the order that the data were added to or removed from the set.

Now when you compare 2 sets for equality, you want to compare that values within the set are equal, not that the underlying trees have the same exact structure. But if you have a function such as depth which exposes the depth of underlying tree maintaining the set then you cannot guarantee that the depths are equal even if the sets compare equal.

Here is a video of great Philip Wadler realizing live and on-stage that many useful relations do not preserve equality (starting at 42min).

Edit: Example from ghc where a == b does not imply f a == f b:

\> import Data.Set \> let a = fromList [1, 2, 3, 4, 5, 10, 9, 8, 7, 6] \> let b = fromList [1..10] \> let f = showTree \> a == b True \> f a == f b False 

Another real-world example is hash-table. Two hash-tables are equal if and only if their key-value pairs tie out. However, the capacity of a hash-table, i.e. the number of keys you may add before having to re-allocate and rehash, depends on the order of inserts/deletes.

So if you have a function which returns the capacity of hash table, it may return different values for hash-tables a and b even though a == b.

like image 40
behzad.nouri Avatar answered Sep 21 '22 14:09

behzad.nouri