I'm working in Java. I have the requirement that I must essentially compare two database queries. To do this, I take each row of the result set and assign it to a HashTable with the field name as the 'key' and the data in the field as the 'value'. I then group the entire result set of HashTables into a single Vector just as a container. So essentially to compare two queries I'm really iterating through two Vectors of HashTables.
I've come to find that this approach works really well for me but requires a lot of memory. Because of other design requirements, I have to do this comparison via a Vector-HashTable-like structure, and not some DB side procedure.
Does anyone have any suggestions for optimization? The optimal solution would be one that is somewhat similar to what I am doing now as most of the code is already designed around it.
Thanks
Provide more memory to your JVM (usually using -Xmx / -Xms ) or don't load all the data into memory. For many operations on huge amounts of data there are algorithms which don't need access to all of it at once. One class of such algorithms are divide and conquer algorithms.
public ArrayList(int initialCapacity)
Use in-process in-memory database like H2 keeping in mind its own limitations (H2 also even can rely on own in-memory file system) Use off-process memory storage like Memcached with corresponding Java client. Set up RAM disk (or use tmpfs, or something like that) and work with memory as with a file system from Java.
Specify the same ORDER BY
clause (based on the "key") for both result sets. Then you only have to have one record from each result set in memory at once.
For example, say your results are res1
and res2
.
If the key
field of res1
is less than the key
field of res2
, res2
is missing some records; iterate res1
until its key
field is equal to or greater than the key
of res2
.
Likewise, if the key
field of res1
is greater than the key
field of res2
, res1
is missing some records; iterate res2
instead.
If the key
fields of the current records are equal, you can compare their values, then iterate both result sets.
You can see, in this manner, that only one record from each result is required to be held in memory at a given time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With