Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stack overflow when using Haskell sort function

Tags:

haskell

I am implementing a filter in Haskell, i.e. a program that I can call from the command-line as follows:

$ cat inputfile.txt | myFilter > outputfile.txt

When running the program on a file of about 80 MB, I get a stack overflow (Stack space overflow: current size 8388608 bytes.). I am using GHC Version 6.12.3 under cygwin.

I think the problem comes from the sort function that I am using in the program, but after I have been looking for the problem for three days I have no clue how to solve this so I would like if someone could give me a hint.

Here are the essential details about my program.

My filter program reads standard input into a string, splits it into lines and parses each line into a record of some type Event

data Event = ...

which is an instance of Ord

instance Ord Event where
    x < y = ...

so that I can sort events using the built-in sort function.

Splitting into lines and parsing the events (one event per line) is performed by a function

p :: String -> [Event]

which internally uses the standard function lines.

I also have a function g that groups events:

g :: [Event] -> [[Event]]

g uses some criteria that are not relevant here; each group can contain at most 4 events.

I sort each group of events (represented as a list) using sort (i.e., all events inside each group get sorted), and finally format the all event groups as a string using a function

f :: [[Event]] -> String

The main function looks as follows:

main = interact (f . (map sort) . g . p)

As said, running this program on a file of about 80 MB gives a stack overflow.

If I replace the sort function with the following function (a naive quick sort implementation):

mySort :: [Event] -> [Event]
mySort [] = []
mySort (e:es) = let h = [j | j <- es, j < e]
                    t = [j | j <- es, e < j]
                in
                  (mySort h) ++ [e] ++ (mySort t)


main = interact (f . (map mySort) . g . p)

I have no stack overflow!

If in the function mySort I replace the definition of t with the following:

                    t = [j | j <- es, e <= j]

i.e. I replace < with <=, the stack overflow is there again!

So I have no clue of what is going here. I cannot see that I have introduced any infinite recursion. My other hypothesis is that lazy evaluation can play a role here (does <= produce a bigger thunk than <?).

I have some experience with Haskell but I am no real expert so I would be glad to get some useful hint because I have been struggling to understand this for the past three days.

like image 647
Giorgio Avatar asked Aug 30 '12 18:08

Giorgio


2 Answers

The culprit is

instance Ord Event where
    x < y = ...

which is the wrong way to define an Ord instance. The minimal complete definition of an Ord instance defines one of compare or (<=). There are default definitions of compare in terms of (<=), and of all Ord member functions in terms of compare. So if you define (<), that's the only Ord member you can use, all others will loop infinitely when called, since they call compare, which calls (<=), which calls compare ...

The Data.List.sort function uses compare to determine the order, so it loops at the first comparison. Your custom quicksort only uses (<), so that works.

like image 68
Daniel Fischer Avatar answered Sep 28 '22 03:09

Daniel Fischer


Before trying out to profile the code and to get quick result - try to increase stack size.

Compile your program with enabling RTS and specify desired stack size.

http://book.realworldhaskell.org/read/profiling-and-optimization.html

like image 34
jdevelop Avatar answered Sep 28 '22 03:09

jdevelop