Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

StackOverflowException in non-infinite, recursive string search

Background. My script encounters a StackOverflowException while recursively searching for specific text in a large string. The loop is not infinite; the problem occurs (for a specific search) between 9,000-10,000 legitimate searches -- I need it to keep going. I'm using tail-recursion (I think) and that may be part of my problem, since I gather that C# does not do this well. However, I'm not sure how to avoid using tail-recursion in my case.

Question(s). Why is the StackOverflowException occurring? Does my overall approach make sense? If the design sucks, I'd rather start there, rather than just avoiding an exception. But if the design is acceptable, what can I do about the StackOverflowException?

Code. The class I've written searches for contacts (about 500+ from a specified list) in a large amount of text (about 6MB). The strategy I'm using is to search for the last name, then look for the first name somewhere shortly before or after the last name. I need to find each instance of each contact within the given text. The StringSearcher class has a recursive method that continues to search for contacts, returning the result whenever one is found, but keeping track of where it left off with the search.

I use this class in the following manner:

StringSearcher searcher = new StringSearcher(
    File.ReadAllText(FilePath),
    "lastname",
    "firstname",
    30
);

string searchResult = null;
while ((searchResult = searcher.NextInstance()) != null)
{
    // do something with each searchResult
}

On the whole, the script seems to work. Most contacts return the results I expect. However, The problem seems to occur when the primary search string is extremely common (thousands of hits), and the secondary search string never or rarely occurs. I know it's not getting stuck because the CurrentIndex is advancing normally.

Here's the recursive method I'm talking about.

public string NextInstance()
{
    // Advance this.CurrentIndex to the next location of the primary search string
    this.SearchForNext();

    // Look a little before and after the primary search string
    this.CurrentContext = this.GetContextAtCurrentIndex();

    // Primary search string found?
    if (this.AnotherInstanceFound)
    {
        // If there is a valid secondary search string, is that found near the
        // primary search string? If not, look for the next instance of the primary
        // search string
        if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
            !this.IsSecondaryFoundInContext())
        {
            return this.NextInstance();
        }
        // 
        else
        {
            return this.CurrentContext;
        }
    }
    // No more instances of the primary search string
    else
    {
        return null;
    }
}

The StackOverflowException occurs on this.CurrentIndex = ... in the following method:

private void SearchForNext()
{
    // If we've already searched once, 
    // increment the current index before searching further.
    if (0 != this.CurrentIndex)
    {
        this.CurrentIndex++;
        this.NumberOfSearches++;
    }

    this.CurrentIndex = this.Source.IndexOf(
            this.PrimarySearchString,
            ValidIndex(this.CurrentIndex),
            StringComparison.OrdinalIgnoreCase
    );

    this.AnotherInstanceFound = !(this.CurrentIndex >= 0) ? false : true;
}

I can include more code if needed. Let me know if one of those methods or variables are questionable.

*Performance is not really a concern because this will likely run at night as a scheduled task.

like image 863
Logical Fallacy Avatar asked Apr 01 '26 06:04

Logical Fallacy


2 Answers

You have a 1MB stack. When that stack space runs out and you still need more stack space a StackOverflowException is thrown. This may or may not be a result of infinite recursion, the runtime has no idea. Infinite recursion is simply one effective way of using more stack space then is available (by using an infinite amount). You can be using a finite amount that just so happens to be more than is available and you'll get the same exception.

While there are other ways to use up lots of stack space, recursion is one of the most effective. Each method is adding more space based on the signature and locals of that method. Having deep recursion can use a lot of stack space, so if you expect to have a depth of more than a few hundred levels (and even that is a lot) you should probably not use recursion. Note that any code using recursion can be written iterativly, or to use an explicit Stack.

It's hard to say, as a complete implementation isn't shown, but based on what I can see you are more or less writing an iterator, but you're not using the C# constructs for one (namely IEnumerable).

My guess is "iterator blocks" will allow you to make this algorithm both easier to write, easier to write non-recursively, and more effective from the caller's side.

Here is a high level look at how you might structure this method as an iterator block:

public static IEnumerable<string> SearchString(string text
    , string firstString, string secondString, int unknown)
{
    int lastIndexFound = text.IndexOf(firstString);

    while (lastIndexFound >= 0)
    {
        if (secondStringNearFirst(text, firstString, secondString, lastIndexFound))
        {
            yield return lastIndexFound.ToString();
        }
    }
}

private static bool secondStringNearFirst(string text
    , string firstString, string secondString, int lastIndexFound)
{
    throw new NotImplementedException();
}
like image 140
Servy Avatar answered Apr 02 '26 18:04

Servy


It doesn't seem like recursion is the right solution here. Normally with recursive problems you have some state you pass to the recursive step. In this case, you really have a plain while loop. Below I put your method body in a loop and changed the recursive step to continue. See if that works...

public string NextInstance()
{
    while (true)
    {
        // Advance this.CurrentIndex to the next location of the primary search string
        this.SearchForNext();

        // Look a little before and after the primary search string
        this.CurrentContext = this.GetContextAtCurrentIndex();

        // Primary search string found?
        if (this.AnotherInstanceFound)
        {
            // If there is a valid secondary search string, is that found near the
            // primary search string? If not, look for the next instance of the primary
            // search string
            if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
                !this.IsSecondaryFoundInContext())
            {
                continue; // Start searching again...
            }
            // 
            else
            {
                return this.CurrentContext;
            }
        }
        // No more instances of the primary search string
        else
        {
            return null;
        }
    }
}
like image 39
David Avatar answered Apr 02 '26 20:04

David