Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I speed up recursive search function?

I am having trouble with the speed of the search function that I wrote. The function steps are described below:

  1. The function begins with two table name parameters, a starting-point and a target
  2. The function then traverses a list of table-column combinations (50,000 long) and retrieves all the combinations associated with the starting-point table.
  3. The function then loops through each of the retrieved combinations and for each combination, it traverses the table-column combinations list once again, but this time looking for tables that match the given column.
  4. Finally, the function loops through each of the retrieved combinations from the last step and for each combination, it checks whether the table is the same as the target table; if so it saves it, and if not it calls itself passing in the table name form that combination.

The function aim is to be able to trace a link between tables where the link is direct or has multiple degrees of separation. The level of recursion is a fixed integer value.

My problem is that any time I try to run this function for two levels of search depth (wouldn't dare try deeper at this stage), the job runs out of memory, or I lose patience. I waited for 17mins before the job ran out of memory once.

The average number of columns per table is 28 and the standard deviation is 34.

Here is a diagram showing examples of the various links that can be made between tables:

Each column can have a match in multiple tables. Each matching table can then be searched column by column for tables with matching columns and so on

Here is my code:

private void FindLinkingTables(List<TableColumns> sourceList, TableSearchNode parentNode, string targetTable, int maxSearchDepth)
{
    if (parentNode.Level < maxSearchDepth)
    {
        IEnumerable<string> tableColumns = sourceList.Where(x => x.Table.Equals(parentNode.Table)).Select(x => x.Column);

        foreach (string sourceColumn in tableColumns)
        {
            string shortName = sourceColumn.Substring(1);

            IEnumerable<TableSearchNode> tables = sourceList.Where(
                x => x.Column.Substring(1).Equals(shortName) && !x.Table.Equals(parentNode.Table) && !parentNode.Ancenstory.Contains(x.Table)).Select(
                    x => new TableSearchNode { Table = x.Table, Column = x.Column, Level = parentNode.Level + 1 });
            foreach (TableSearchNode table in tables)
            {
                parentNode.AddChildNode(sourceColumn, table);
                if (!table.Table.Equals(targetTable))
                {
                    FindLinkingTables(sourceList, table, targetTable, maxSearchDepth);
                }
                else
                {
                    table.NotifySeachResult(true);
                }
            }
        }
    }
}

EDIT: separated out TableSearchNode logic and added property and method for completeness

//TableSearchNode
public Dictionary<string, List<TableSearchNode>> Children { get; private set; }

//TableSearchNode
public List<string> Ancenstory
{
    get
    {
        Stack<string> ancestory = new Stack<string>();
        TableSearchNode ancestor = ParentNode;
        while (ancestor != null)
        {
            ancestory.Push(ancestor.tbl);
            ancestor = ancestor.ParentNode;
        }
        return ancestory.ToList();
    }
}

//TableSearchNode
public void AddChildNode(string referenceColumn, TableSearchNode childNode)
    {
        childNode.ParentNode = this;
        List<TableSearchNode> relatedTables = null;
        Children.TryGetValue(referenceColumn, out relatedTables);
        if (relatedTables == null)
        {
            relatedTables = new List<TableSearchNode>();
            Children.Add(referenceColumn, relatedTables);
        }
        relatedTables.Add(childNode);
    }

Thanks in advance for your help!

like image 651
Sinker Avatar asked Jun 13 '14 12:06

Sinker


People also ask

How to make recursive code more efficient?

Bottom-up. Sometimes the best way to improve the efficiency of a recursive algorithm is to not use recursion at all. In the case of generating Fibonacci numbers, an iterative technique called the bottom-up approach can save us both time and space.

How do you make a recursive function faster in Python?

Recursive method calls in Python cause a new stack frame allocation for every call. If you can iterate over a list instead then you avoid this allocation and will see a tremendous speed increase.

How does recursion cause stack overflow?

The most-common cause of stack overflow is excessively deep or infinite recursion, in which a function calls itself so many times that the space needed to store the variables and information associated with each call is more than can fit on the stack.


1 Answers

You really wasting a lot of memory. What immediately comes to mind:

  1. First of all replace incoming List<TableColumns> sourceList with ILookup<string, TableColumns>. You should do this once before calling FindLinkingTables:

    ILookup<string, TableColumns> sourceLookup = sourceList.ToLookup(s => s.Table);
    FindLinkingTables(sourceLookup, parentNode, targetTable, maxSearchDepth);
    
  2. Do not call .ToList() if do not really need it. For example, if you are going only to enumerate all children of resulting list once, you do not need it. So your main function will looks like this:

    private void FindLinkingTables(ILookup<string, TableColumns> sourceLookup, TableSearchNode parentNode, string targetTable, int maxSearchDepth)
    {
        if (parentNode.Level < maxSearchDepth)
        {
            var tableColumns = sourceLookup[parentNode.Table].Select(x => x.Column);
    
            foreach (string sourceColumn in tableColumns)
            {
                string shortName = sourceColumn.Substring(1);
    
                var tables = sourceLookup
                    .Where(
                        group => !group.Key.Equals(parentNode.Table)
                                 && !parentNode.Ancenstory.Contains(group.Key))
                    .SelectMany(group => group)
                    .Where(tableColumn => tableColumn.Column.Substring(1).Equals(shortName))
                    .Select(
                        x => new TableSearchNode
                        {
                            Table = x.Table,
                            Column = x.Column,
                            Level = parentNode.Level + 1
                        });
    
                foreach (TableSearchNode table in tables)
                {
                    parentNode.AddChildNode(sourceColumn, table);
                    if (!table.Table.Equals(targetTable))
                    {
                        FindLinkingTables(sourceLookup, table, targetTable, maxSearchDepth);
                    }
                    else
                    {
                        table.NotifySeachResult(true);
                    }
                }
            }
        }
    }
    

    [Edit]

  3. Also in order to speedup remaining complex LINQ query, you can prepare yet another ILookup:

    ILookup<string, TableColumns> sourceColumnLookup = sourceLlist
            .ToLookup(t => t.Column.Substring(1));
    
    //...
    
    private void FindLinkingTables(
        ILookup<string, TableColumns> sourceLookup, 
        ILookup<string, TableColumns> sourceColumnLookup,
        TableSearchNode parentNode, string targetTable, int maxSearchDepth)
    {
        if (parentNode.Level >= maxSearchDepth) return;
    
        var tableColumns = sourceLookup[parentNode.Table].Select(x => x.Column);
    
        foreach (string sourceColumn in tableColumns)
        {
            string shortName = sourceColumn.Substring(1);
    
            var tables = sourceColumnLookup[shortName]
                .Where(tableColumn => !tableColumn.Table.Equals(parentNode.Table)
                                      && !parentNode.AncenstoryReversed.Contains(tableColumn.Table))
                .Select(
                    x => new TableSearchNode
                        {
                            Table = x.Table,
                            Column = x.Column,
                            Level = parentNode.Level + 1
                        });
    
    
            foreach (TableSearchNode table in tables)
            {
                parentNode.AddChildNode(sourceColumn, table);
                if (!table.Table.Equals(targetTable))
                {
                    FindLinkingTables(sourceLookup, sourceColumnLookup, table, targetTable, maxSearchDepth);
                }
                else
                {
                    table.NotifySeachResult(true);
                }
            }
        }
    }
    
  4. I've checked your Ancestory property. If IEnumerable<string> is enough for your needs check this implementation:

    public IEnumerable<string> AncenstoryEnum
    {
        get { return AncenstoryReversed.Reverse(); }
    }
    
    public IEnumerable<string> AncenstoryReversed
    {
        get
        {
            TableSearchNode ancestor = ParentNode;
            while (ancestor != null)
            {
                yield return ancestor.tbl;
                ancestor = ancestor.ParentNode;
            }
        }
    }
    
like image 55
Woodman Avatar answered Sep 28 '22 00:09

Woodman