I am coding up an algorithm for constructing a suffix tree in Mathematica based on Ukkonen's algorithm.
The question I have is, will passing my entire tree structure (which I have stored in a list) to a function to search through, cost my program a lot of memory and time since I have to use some of the functions multiple times in the algorithm?
For example, I have a function that searches for the children of a specific node, and I use the Select
function to search the entire tree.
getChildren[parentID_] := Select[tree, #[[3]] == parentID &];
However I need to access the tree, so is it reasonable to pass the entire tree structure to the function? Since it doesn't seem that there is a way to make a variable global to the entire notebook. Or is there some alternate way to get around this?
No, it does not cost extra memory to pass expressions. As is usual in functional languages, Mathematica objects are immutable: they cannot be modified, instead a new object is created when you transform them using some function. This also means that if you don't transform them, they're not copied, no matter how much you pass them around between functions.
From a user perspective, Mathematica expressions are trees, but I believe that internally they're stored as directed acyclic graphs, i.e. the same subexpression may be stored only once in memory, regardless of how many times it appears in the full expression (see e.g. the doc page of Share[]
).
Here's an example to illustrate:
First, make sure In
/Out
don't take up extra memory:
In[1]:= $HistoryLength = 0;
Check memory usage:
In[2]:= MemoryInUse[]
Out[2]= 13421756
Let's make an expression that takes up a noticeable amount of memory:
In[3]:= s = f@Range[1000000];
In[4]:= MemoryInUse[]
Out[4]= 17430260
Now repeat this expression a hundred times ...
In[5]:= t = ConstantArray[s, 100];
... and notice that memory usage barely increases:
In[6]:= MemoryInUse[]
Out[6]= 18264676
ByeCount[]
is misleading because it doesn't report the actual physical memory used, but the memory that would be used if common subexpressions weren't allowed to share the same memory:
In[7]:= ByteCount[t]
Out[7]= 400018040
An interesting point to note: if you remove f[...]
from s
, and make both s
and t
a plain numerical array, then this memory sharing will not happen, and memory usage will jump to ~400 MB.
Whether you make tree
a global variable or an argument of getChildren
, it will not make a difference in memory usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With