I am having trouble with the speed of the search function that I wrote. The function steps are described below:
The function aim is to be able to trace a link between tables where the link is direct or has multiple degrees of separation. The level of recursion is a fixed integer value.
My problem is that any time I try to run this function for two levels of search depth (wouldn't dare try deeper at this stage), the job runs out of memory, or I lose patience. I waited for 17mins before the job ran out of memory once.
The average number of columns per table is 28 and the standard deviation is 34.
Here is a diagram showing examples of the various links that can be made between tables:
Here is my code:
private void FindLinkingTables(List<TableColumns> sourceList, TableSearchNode parentNode, string targetTable, int maxSearchDepth)
{
if (parentNode.Level < maxSearchDepth)
{
IEnumerable<string> tableColumns = sourceList.Where(x => x.Table.Equals(parentNode.Table)).Select(x => x.Column);
foreach (string sourceColumn in tableColumns)
{
string shortName = sourceColumn.Substring(1);
IEnumerable<TableSearchNode> tables = sourceList.Where(
x => x.Column.Substring(1).Equals(shortName) && !x.Table.Equals(parentNode.Table) && !parentNode.Ancenstory.Contains(x.Table)).Select(
x => new TableSearchNode { Table = x.Table, Column = x.Column, Level = parentNode.Level + 1 });
foreach (TableSearchNode table in tables)
{
parentNode.AddChildNode(sourceColumn, table);
if (!table.Table.Equals(targetTable))
{
FindLinkingTables(sourceList, table, targetTable, maxSearchDepth);
}
else
{
table.NotifySeachResult(true);
}
}
}
}
}
EDIT: separated out TableSearchNode logic and added property and method for completeness
//TableSearchNode
public Dictionary<string, List<TableSearchNode>> Children { get; private set; }
//TableSearchNode
public List<string> Ancenstory
{
get
{
Stack<string> ancestory = new Stack<string>();
TableSearchNode ancestor = ParentNode;
while (ancestor != null)
{
ancestory.Push(ancestor.tbl);
ancestor = ancestor.ParentNode;
}
return ancestory.ToList();
}
}
//TableSearchNode
public void AddChildNode(string referenceColumn, TableSearchNode childNode)
{
childNode.ParentNode = this;
List<TableSearchNode> relatedTables = null;
Children.TryGetValue(referenceColumn, out relatedTables);
if (relatedTables == null)
{
relatedTables = new List<TableSearchNode>();
Children.Add(referenceColumn, relatedTables);
}
relatedTables.Add(childNode);
}
Thanks in advance for your help!
Bottom-up. Sometimes the best way to improve the efficiency of a recursive algorithm is to not use recursion at all. In the case of generating Fibonacci numbers, an iterative technique called the bottom-up approach can save us both time and space.
Recursive method calls in Python cause a new stack frame allocation for every call. If you can iterate over a list instead then you avoid this allocation and will see a tremendous speed increase.
The most-common cause of stack overflow is excessively deep or infinite recursion, in which a function calls itself so many times that the space needed to store the variables and information associated with each call is more than can fit on the stack.
You really wasting a lot of memory. What immediately comes to mind:
First of all replace incoming List<TableColumns> sourceList
with ILookup<string, TableColumns>
. You should do this once before calling FindLinkingTables
:
ILookup<string, TableColumns> sourceLookup = sourceList.ToLookup(s => s.Table);
FindLinkingTables(sourceLookup, parentNode, targetTable, maxSearchDepth);
Do not call .ToList()
if do not really need it. For example, if you are going only to enumerate all children of resulting list once, you do not need it. So your main function will looks like this:
private void FindLinkingTables(ILookup<string, TableColumns> sourceLookup, TableSearchNode parentNode, string targetTable, int maxSearchDepth)
{
if (parentNode.Level < maxSearchDepth)
{
var tableColumns = sourceLookup[parentNode.Table].Select(x => x.Column);
foreach (string sourceColumn in tableColumns)
{
string shortName = sourceColumn.Substring(1);
var tables = sourceLookup
.Where(
group => !group.Key.Equals(parentNode.Table)
&& !parentNode.Ancenstory.Contains(group.Key))
.SelectMany(group => group)
.Where(tableColumn => tableColumn.Column.Substring(1).Equals(shortName))
.Select(
x => new TableSearchNode
{
Table = x.Table,
Column = x.Column,
Level = parentNode.Level + 1
});
foreach (TableSearchNode table in tables)
{
parentNode.AddChildNode(sourceColumn, table);
if (!table.Table.Equals(targetTable))
{
FindLinkingTables(sourceLookup, table, targetTable, maxSearchDepth);
}
else
{
table.NotifySeachResult(true);
}
}
}
}
}
[Edit]
Also in order to speedup remaining complex LINQ query, you can prepare yet another ILookup
:
ILookup<string, TableColumns> sourceColumnLookup = sourceLlist
.ToLookup(t => t.Column.Substring(1));
//...
private void FindLinkingTables(
ILookup<string, TableColumns> sourceLookup,
ILookup<string, TableColumns> sourceColumnLookup,
TableSearchNode parentNode, string targetTable, int maxSearchDepth)
{
if (parentNode.Level >= maxSearchDepth) return;
var tableColumns = sourceLookup[parentNode.Table].Select(x => x.Column);
foreach (string sourceColumn in tableColumns)
{
string shortName = sourceColumn.Substring(1);
var tables = sourceColumnLookup[shortName]
.Where(tableColumn => !tableColumn.Table.Equals(parentNode.Table)
&& !parentNode.AncenstoryReversed.Contains(tableColumn.Table))
.Select(
x => new TableSearchNode
{
Table = x.Table,
Column = x.Column,
Level = parentNode.Level + 1
});
foreach (TableSearchNode table in tables)
{
parentNode.AddChildNode(sourceColumn, table);
if (!table.Table.Equals(targetTable))
{
FindLinkingTables(sourceLookup, sourceColumnLookup, table, targetTable, maxSearchDepth);
}
else
{
table.NotifySeachResult(true);
}
}
}
}
I've checked your Ancestory
property. If IEnumerable<string>
is enough for your needs check this implementation:
public IEnumerable<string> AncenstoryEnum
{
get { return AncenstoryReversed.Reverse(); }
}
public IEnumerable<string> AncenstoryReversed
{
get
{
TableSearchNode ancestor = ParentNode;
while (ancestor != null)
{
yield return ancestor.tbl;
ancestor = ancestor.ParentNode;
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With