Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding stack overflow from recursion in Clojure

I'm new to Clojure and am having trouble figuring out how to avoid stack overflows in certain situations. One such situation came up while trying to port a parsing project to Clojure with a parser combinator library I found called kern.

Kern defines a recursive implementation for a "many-till" parser: source

This works fine for small inputs:

(def input-text "The blue {cat} and the red {dog} became best friends with the white {wolf} END {not included}")

(def  between-brackets "parser that grabs all text between brackets"
  (between (sym* \{) (sym* \}) (<+> (many (none-of* "}")))))

(def parse-enclosed-words "parser that attempts to grab text between brackets, 
skips a character when it can't, and halts when it hits the string END"
  (many-till (<|> between-brackets (skip any-char)) (token* "END")))

(filter #(some? %) (value parse-enclosed-words input-text)) ;; => ("cat" "dog" "wolf")

Unfortunately, the parser encounters stack overflows as input strings grow:

(def file-input-text (slurp (io/resource "[input-text-20x-repeated.txt][2]") ))
(filter #(some? %) (value parse-enclosed-words file-input-text)) ;; => Unhandled java.lang.StackOverflowError

From what I can tell by reading online, this is probably due to the function using stack-consuming recursion. I played around with some attempts to re-write the function using the "recur" keyword, but since the recursive call is not in tail position this didn't seem to work.

How can many-till be modified to avoid stack overflow?

like image 944
Unsatisfied Zebra Avatar asked Sep 17 '21 05:09

Unsatisfied Zebra


People also ask

How does stack overflow prevent recursion?

In order to prevent stack overflow bugs, you must have a base case where the function stops make new recursive calls. If there is no base case then the function calls will never stop and eventually a stack overflow will occur.

How can a stack overflow be avoided?

One method to prevent stack overflow is to track the stack pointer with test and measurement methods. Use timer interrupts that periodically check the location of the stack pointer, record the largest value, and watch that it does not grow beyond that value.

Can recursion cause stack overflow?

The most-common cause of stack overflow is excessively deep or infinite recursion, in which a function calls itself so many times that the space needed to store the variables and information associated with each call is more than can fit on the stack.

Is it bad to know recursion?

While knowing the recursive solution is not a bad thing, one should also realize that many times the iterative solution is better. A number of ways of approaching converting a recursive algorithm to an iterative one can be seen on Stack Overflow at Way to go from recursion to iteration.

How to avoid stack overflow in a recursive algorithm?

To the general question of "methods to avoid a stack overflow in a recursive algorithm"... Another approach is to include a recursion counter. This is more for detecting infinite loops caused by situations beyond one's control (and poor coding). Each time you make a call, you increment the counter.

What is tail recursion in C++?

Tail recursion is a recursion of a function where it does not consumes stack space and hence prevents stack overflow. If the recursive function is made tail-recursive then it is more efficient than a non-tail-recursive function because every function call does not need to go on stack and pop when the call is done.

Why do we get stack-overflow when we call a recursive function?

If you are using recursive function, since you don’t have control on call stack and the stack is limited, the stack-overflow might occur when the recursive call’s depth gets too deep.


Video Answer


2 Answers

I don't think you can do it with the abstractions the library provides. This is a very natural definition of many-till, and would work fine in Parsec, the library that Kern is obviously inspired by. But Clojure doesn't have Haskell's lazy evaluation and automatic trampolining, so the nested lambdas that many-till builds inevitably consume unbounded stack. You would need an implementation more like its implementation of many, which builds a parser by hand out of a function. I would include its source below, but I don't own it and so I don't think I am authorized to give Stack Overflow a CC BY-SA 4.0 license to it, as posting it would do. Instead, here is a link to its source.

like image 185
amalloy Avatar answered Oct 24 '22 05:10

amalloy


There is already a good answer discussing the specifics of recursion with this particular library.

On the more general question of parsing data using Clojure, you should check out the high-quality parser library Instaparse. It is written in Clojure, and is very powerful. You could either use it directly, or for comparison purposes.

See also the video from Clojure/West.

like image 30
Alan Thompson Avatar answered Oct 24 '22 03:10

Alan Thompson