Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store millions of Double during a calculation?

My engine is executing 1,000,000 of simulations on X deals. During each simulation, for each deal, a specific condition may be verified. In this case, I store the value (which is a double) into an array. Each deal will have its own list of values (i.e. these values are indenpendant from one deal to another deal).

At the end of all the simulations, for each deal, I run an algorithm on his List<Double> to get some outputs. Unfortunately, this algorithm requires the complete list of these values, and thus, I am not able to modify my algorithm to calculate the outputs "on the fly", i.e. during the simulations.

In "normal" conditions (i.e. X is low, and the condition is verified less than 10% of the time), the calculation ends correctly, even if this may be enhanced.

My problem occurs when I have many deals (for example X = 30) and almost all of my simulations verify my specific condition (let say 90% of simulations). So just to store the values, I need about 900,000 * 30 * 64bits of memory (about 216Mb). One of my future requirements is to be able to run 5,000,000 of simulations...

So I can't continue with my current way of storing the values. For the moment, I used a "simple" structure of Map<String, List<Double>>, where the key is the ID of the element, and List<Double> the list of values.

So my question is how can I enhance this specific part of my application in order to reduce the memory usage during the simulations?

Also another important note is that for the final calculation, my List<Double> (or whatever structure I will be using) must be ordered. So if the solution to my previous question also provide a structure that order the new inserted element (such as a SortedMap), it will be really great!

I am using Java 1.6.


Edit 1

My engine is executing some financial calculations indeed, and in my case, all deals are related. This means that I cannot run my calculations on the first deal, get the output, clean the List<Double>, and then move to the second deal, and so on.

Of course, as a temporary solution, we will increase the memory allocated to the engine, but it's not the solution I am expecting ;)


Edit 2

Regarding the algorithm itself. I can't give the exact algorithm here, but here are some hints:

We must work on a sorted List<Double>. I will then calculate an index (which is calculated against a given parameter and the size of the List itself). Then, I finally return the index-th value of this List.

public static double algo(double input, List<Double> sortedList) {
    if (someSpecificCases) {
        return 0;
    }
    // Calculate the index value, using input and also size of the sortedList...
    double index = ...;
    // Specific case where I return the first item of my list.
    if (index == 1) {
        return sortedList.get(0);
    }
    // Specific case where I return the last item of my list.
    if (index == sortedList.size()) {
        return sortedList.get(sortedList.size() - 1);
    }
    // Here, I need the index-th value of my list...
    double val = sortedList.get((int) index);
    double finalValue = someBasicCalculations(val);
    return finalValue;
}

I hope it will help to have such information now...


Edit 3

Currently, I will not consider any hardware modification (too long and complicated here :( ). The solution of increasing the memory will be done, but it's just a quick fix.

I was thinking of a solution that use a temporary file: Until a certain threshold (for example 100,000), my List<Double> stores new values in memory. When the size of List<Double> reaches this threshold, I append this list in the temporary file (one file per deal).

Something like that:

public void addNewValue(double v) {
    if (list.size() == 100000) {
        appendListInFile();
        list.clear();
    }
    list.add(v);
}

At the end of the whole calculation, for each deal, I will reconstruct the complete List<Double> from what I have in memory and also in the temporary file. Then, I run my algorithm. I clean the values for this deal, and move to the second deal (I can do that now, as all the simulations are now finished).

What do you think of such solution? Do you think it is acceptable?

Of course I will lose some time to read and write my values in an external file, but I think this can be acceptable, no?

like image 988
Romain Linsolas Avatar asked Oct 14 '10 15:10

Romain Linsolas


4 Answers

Your problem is algorithmic and you are looking for a "reduction in strength" optimization.

Unfortunately, you've been too coy in the the problem description and say "Unfortunately, this algorithm requires the complete list of these values..." which is dubious. The simulation run has already passed a predicate which in itself tells you something about the sets that pass through the sieve.

I expect the data that meets the criteria has a low information content and therefore is amenable to substantial compression.

Without further information, we really can't help you more.

like image 116
msw Avatar answered Sep 28 '22 18:09

msw


  1. You mentioned that the "engine" is not connected to a database, but have you considered using a database to store the lists of elements? Possibly an embedded DB such as SQLite?

  2. If you used int or even short instead of string for the key field of your Map, that might save some memory.

  3. If you need a collection object that guarantees order, then consider a Queue or a Stack instead of your List that you are currently using.

  4. Possibly think of a way to run deals sequentially, as Dommer and Alan have already suggested.

I hope that was of some help!


EDIT:

Your comment about only having 30 keys is a good point.

  1. In that case, since you have to calculate all your deals at the same time, then have you considered serializing your Lists to disk (i.e. XML)?

  2. Or even just writing a text file to disk for each List, then after the deals are calculated, loading one file/List at a time to verify that List of conditions?

Of course the disadvantage is slow file IO, but, this would reduced your server's memory requirement.

like image 42
JohnB Avatar answered Sep 28 '22 18:09

JohnB


Can you get away with using floats instead of doubles? That would save you 100Mb.

like image 43
JeremyP Avatar answered Sep 28 '22 19:09

JeremyP


Just to clarify, do you need ALL of the information in memory at once? It sounds like you are doing financial simulations (maybe credit risk?). Say you are running 30 deals, do you need to store all of the values in memory? Or can you run the first deal (~900,000 * 64bits), then discard the list of double (serialize it to disk or something) and then proceed with the next? I thought this might be okay as you say the deals are independent of one another.

Apologies if this sounds patronising; I'm just trying to get a proper idea of the problem.

like image 43
Tom Chantler Avatar answered Sep 28 '22 17:09

Tom Chantler