I am implementing a filter in Haskell, i.e. a program that I can call from the command-line as follows:
$ cat inputfile.txt | myFilter > outputfile.txt
When running the program on a file of about 80 MB, I get a stack overflow (Stack space overflow: current size 8388608 bytes.). I am using GHC Version 6.12.3 under cygwin.
I think the problem comes from the sort
function that I am using in the program, but after I have been looking for the problem for three days I have no clue how to solve this so I would like if someone could give me a hint.
Here are the essential details about my program.
My filter program reads standard input into a string, splits it into lines and parses each line into a record of some type Event
data Event = ...
which is an instance of Ord
instance Ord Event where
x < y = ...
so that I can sort events using the built-in sort
function.
Splitting into lines and parsing the events (one event per line) is performed by a function
p :: String -> [Event]
which internally uses the standard function lines
.
I also have a function g that groups events:
g :: [Event] -> [[Event]]
g uses some criteria that are not relevant here; each group can contain at most 4 events.
I sort each group of events (represented as a list) using sort
(i.e., all events inside each group get sorted), and finally format the all event groups as a string using a function
f :: [[Event]] -> String
The main function looks as follows:
main = interact (f . (map sort) . g . p)
As said, running this program on a file of about 80 MB gives a stack overflow.
If I replace the sort function with the following function (a naive quick sort implementation):
mySort :: [Event] -> [Event]
mySort [] = []
mySort (e:es) = let h = [j | j <- es, j < e]
t = [j | j <- es, e < j]
in
(mySort h) ++ [e] ++ (mySort t)
main = interact (f . (map mySort) . g . p)
I have no stack overflow!
If in the function mySort
I replace the definition of t
with the following:
t = [j | j <- es, e <= j]
i.e. I replace <
with <=
, the stack overflow is there again!
So I have no clue of what is going here.
I cannot see that I have introduced any infinite recursion. My other hypothesis is that lazy evaluation can play a role here (does <=
produce a bigger thunk than <
?).
I have some experience with Haskell but I am no real expert so I would be glad to get some useful hint because I have been struggling to understand this for the past three days.
The culprit is
instance Ord Event where
x < y = ...
which is the wrong way to define an Ord
instance. The minimal complete definition of an Ord
instance defines one of compare
or (<=)
. There are default definitions of compare
in terms of (<=)
, and of all Ord
member functions in terms of compare
. So if you define (<)
, that's the only Ord
member you can use, all others will loop infinitely when called, since they call compare
, which calls (<=)
, which calls compare
...
The Data.List.sort
function uses compare
to determine the order, so it loops at the first comparison. Your custom quicksort only uses (<)
, so that works.
Before trying out to profile the code and to get quick result - try to increase stack size.
Compile your program with enabling RTS and specify desired stack size.
http://book.realworldhaskell.org/read/profiling-and-optimization.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With