Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does passing a variable with a large amount of data cost a lot of memory and time in Mathematica?

I am coding up an algorithm for constructing a suffix tree in Mathematica based on Ukkonen's algorithm.

The question I have is, will passing my entire tree structure (which I have stored in a list) to a function to search through, cost my program a lot of memory and time since I have to use some of the functions multiple times in the algorithm?

For example, I have a function that searches for the children of a specific node, and I use the Select function to search the entire tree.

getChildren[parentID_] := Select[tree, #[[3]] == parentID &];

However I need to access the tree, so is it reasonable to pass the entire tree structure to the function? Since it doesn't seem that there is a way to make a variable global to the entire notebook. Or is there some alternate way to get around this?

like image 217
Steve Avatar asked Nov 30 '11 10:11

Steve


1 Answers

No, it does not cost extra memory to pass expressions. As is usual in functional languages, Mathematica objects are immutable: they cannot be modified, instead a new object is created when you transform them using some function. This also means that if you don't transform them, they're not copied, no matter how much you pass them around between functions.


From a user perspective, Mathematica expressions are trees, but I believe that internally they're stored as directed acyclic graphs, i.e. the same subexpression may be stored only once in memory, regardless of how many times it appears in the full expression (see e.g. the doc page of Share[]).

Here's an example to illustrate:

First, make sure In/Out don't take up extra memory:

In[1]:= $HistoryLength = 0;

Check memory usage:

In[2]:= MemoryInUse[]
Out[2]= 13421756

Let's make an expression that takes up a noticeable amount of memory:

In[3]:= s = f@Range[1000000];

In[4]:= MemoryInUse[]
Out[4]= 17430260

Now repeat this expression a hundred times ...

In[5]:= t = ConstantArray[s, 100];

... and notice that memory usage barely increases:

In[6]:= MemoryInUse[]
Out[6]= 18264676

ByeCount[] is misleading because it doesn't report the actual physical memory used, but the memory that would be used if common subexpressions weren't allowed to share the same memory:

In[7]:= ByteCount[t]
Out[7]= 400018040

An interesting point to note: if you remove f[...] from s, and make both s and t a plain numerical array, then this memory sharing will not happen, and memory usage will jump to ~400 MB.


Whether you make tree a global variable or an argument of getChildren, it will not make a difference in memory usage.

like image 168
Szabolcs Avatar answered Sep 30 '22 16:09

Szabolcs