I'm new to Clojure and am having trouble figuring out how to avoid stack overflows in certain situations. One such situation came up while trying to port a parsing project to Clojure with a parser combinator library I found called kern.
Kern defines a recursive implementation for a "many-till" parser: source
This works fine for small inputs:
(def input-text "The blue {cat} and the red {dog} became best friends with the white {wolf} END {not included}")
(def between-brackets "parser that grabs all text between brackets"
(between (sym* \{) (sym* \}) (<+> (many (none-of* "}")))))
(def parse-enclosed-words "parser that attempts to grab text between brackets,
skips a character when it can't, and halts when it hits the string END"
(many-till (<|> between-brackets (skip any-char)) (token* "END")))
(filter #(some? %) (value parse-enclosed-words input-text)) ;; => ("cat" "dog" "wolf")
Unfortunately, the parser encounters stack overflows as input strings grow:
(def file-input-text (slurp (io/resource "[input-text-20x-repeated.txt][2]") ))
(filter #(some? %) (value parse-enclosed-words file-input-text)) ;; => Unhandled java.lang.StackOverflowError
From what I can tell by reading online, this is probably due to the function using stack-consuming recursion. I played around with some attempts to re-write the function using the "recur" keyword, but since the recursive call is not in tail position this didn't seem to work.
How can many-till be modified to avoid stack overflow?
In order to prevent stack overflow bugs, you must have a base case where the function stops make new recursive calls. If there is no base case then the function calls will never stop and eventually a stack overflow will occur.
One method to prevent stack overflow is to track the stack pointer with test and measurement methods. Use timer interrupts that periodically check the location of the stack pointer, record the largest value, and watch that it does not grow beyond that value.
The most-common cause of stack overflow is excessively deep or infinite recursion, in which a function calls itself so many times that the space needed to store the variables and information associated with each call is more than can fit on the stack.
While knowing the recursive solution is not a bad thing, one should also realize that many times the iterative solution is better. A number of ways of approaching converting a recursive algorithm to an iterative one can be seen on Stack Overflow at Way to go from recursion to iteration.
To the general question of "methods to avoid a stack overflow in a recursive algorithm"... Another approach is to include a recursion counter. This is more for detecting infinite loops caused by situations beyond one's control (and poor coding). Each time you make a call, you increment the counter.
Tail recursion is a recursion of a function where it does not consumes stack space and hence prevents stack overflow. If the recursive function is made tail-recursive then it is more efficient than a non-tail-recursive function because every function call does not need to go on stack and pop when the call is done.
If you are using recursive function, since you don’t have control on call stack and the stack is limited, the stack-overflow might occur when the recursive call’s depth gets too deep.
I don't think you can do it with the abstractions the library provides. This is a very natural definition of many-till
, and would work fine in Parsec, the library that Kern is obviously inspired by. But Clojure doesn't have Haskell's lazy evaluation and automatic trampolining, so the nested lambdas that many-till
builds inevitably consume unbounded stack. You would need an implementation more like its implementation of many
, which builds a parser by hand out of a function. I would include its source below, but I don't own it and so I don't think I am authorized to give Stack Overflow a CC BY-SA 4.0 license to it, as posting it would do. Instead, here is a link to its source.
There is already a good answer discussing the specifics of recursion with this particular library.
On the more general question of parsing data using Clojure, you should check out the high-quality parser library Instaparse. It is written in Clojure, and is very powerful. You could either use it directly, or for comparison purposes.
See also the video from Clojure/West.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With