Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reducing the memory footprint of a C# application

I am developing a C# application which needs to process approximately 4,000,000 english sentences. All these sentences are being stored in a tree. Where each node in the tree is a class which has these fields:

class TreeNode
{
    protected string word;
    protected Dictionary<string, TreeNode> children;
}

My problem is that the application is using up all the RAM (I have 2 GB RAM) when it reaches the 2,000,000th sentence. So it only manages to process half the sentences and then it slows down drastically.

What can I do to try and reduce the memory footprint of the application?

EDIT: Let me explain a bit more my application. So I have approximately 300,000 english sentences, and from each sentence I am generating further sub sentences like this:

Example: Sentence: Football is a very popular sport Sub Sentences I need:

  1. Football is a very popular sport
  2. is a very popular sport
  3. a very popular sport
  4. very popular sport
  5. popular sport
  6. sport

Each sentence is stored in a tree word by word. So considering the example above, i have a TreeNode Class with the word field = "Football", and the children list has the TreeNode for the word "is". The child of the "is" node is the "a" node. The child for the "a" node is the "very" node. I need to store the sentences word by word since i need to be able to search for all the sentences starting with Example: "Football is".

So basically for each word in a sentence i am creating a new (sub-sentence). And this is the reason I ultimately end up with 4,000,000 different sentences. Storing the data in a database is not an option, since the app needs to work on the whole structure at once. And it will further slow down the process if i had to stay writing all the data to a database.

Thanks

like image 695
PB_MLT Avatar asked Jan 02 '10 00:01

PB_MLT


People also ask

What reduces the memory footprint of a system in Xcode?

Consider lowering the resolution of these image buffers or applying fewer effects when running on memory-constrained devices. Avoid loading unused resources. Xcode can help you identify Metal objects that aren't in use. Use Metal Debugger to get a GPU trace.

How does C memory work?

In C, the library function malloc is used to allocate a block of memory on the heap. The program accesses this block of memory via a pointer that malloc returns. When the memory is no longer needed, the pointer is passed to free which deallocates the memory so that it can be used for other purposes.

How do I reduce memory usage in Python?

Use join() instead of '+' to concatenate string The longer the string, the more memory consumed, the less efficient the code becomes. Using join() can improve speed >30% vs '+' operator.


2 Answers

What is it you are using as the key? Where are you getting the data from? If these are words (not full setences), I'm wondering if you have a lot of duplicated keys (different string instances with the same fundamental value), in which case you might benefit from implementing a local interner to re-use the values (and let the transient copies get garbage collected).

public sealed class StringCache {
    private readonly Dictionary<string,string> values
        = new Dictionary<string,string>(StringComparer.Ordinal);
    public string this[string value] {
        get {
            string cached;
            if (!values.TryGetValue(value, out cached)) {
                values.Add(value, value);
                cached = value;
            }
            return cached;
        }
    }
}

Instantiate this when building the tree, and use (when you think a value is likely to be duplicated):

StringCache cache = new StringCache(); // re-use this instance while building
                                       // your tree
...
string s = ... // whatever (from reading your input)
s = cache[s];
like image 140
Marc Gravell Avatar answered Oct 30 '22 01:10

Marc Gravell


The Dictionary type itself can consume a lot of memory. Have you considered using a List<KeyValuePair<string, TreeNode>> instead? The generic List uses a lot less memory per instance than a generic Dictionary.

Of course, the limitation of using a List instead of a Dictionary is that you don't get automatic indexing by strings. This would be a clear trade off between time and space. If the lists are short, it might even be faster than the dictionary (a linear search of ~10 keys is often going to be faster than a hashtable search). Even if at least most of the lists are short, it could still be a large improvement (e.g. if 95% of the lists have 10 or fewer items, and the other 5% have a max of maybe 100 items).

You could even use Collection<KeyValuePair<string, TreeNode>>, which uses even less memory than List<T>.

like image 26
Eilon Avatar answered Oct 30 '22 01:10

Eilon