I'm working in Java. I have the requirement that I must essentially compare two database queries. To do this, I take each row of the result set and assign it to a HashTable with the field name as the 'key' and the data in the field as the 'value'. I then group the entire result set of HashTables into a single Vector just as a container. So essentially to compare two queries I'm really iterating through two Vectors of HashTables. I've come to find that this approach works really well for me but requires a lot of memory. Because of other design requirements, I have to do this comparison via a Vector-HashTable-like structure, and not some DB side procedure. Does anyone have any suggestions for optimization? The optimal solution would be one that is somewhat similar to what I am doing now as most of the code is already designed around it. Thanks

Specify the same <code>ORDER BY</code> clause (based on the "key") for both result sets. Then you only have to have one record from each result set in memory at once. For example, say your results are <code>res1</code> and <code>res2</code>. If the <code>key</code> field of <code>res1</code> is less than the <code>key</code> field of <code>res2</code>, <code>res2</code> is missing some records; iterate <code>res1</code> until its <code>key</code> field is equal to or greater than the <code>key</code> of <code>res2</code>. Likewise, if the <code>key</code> field of <code>res1</code> is greater than the <code>key</code> field of <code>res2</code>, <code>res1</code> is missing some records; iterate <code>res2</code> instead. If the <code>key</code> fields of the current records are equal, you can compare their values, then iterate both result sets. You can see, in this manner, that only one record from each result is required to be held in memory at a given time.

How to handle large data sets in Java without using too much memory

Tags:

java

I'm working in Java. I have the requirement that I must essentially compare two database queries. To do this, I take each row of the result set and assign it to a HashTable with the field name as the 'key' and the data in the field as the 'value'. I then group the entire result set of HashTables into a single Vector just as a container. So essentially to compare two queries I'm really iterating through two Vectors of HashTables.

I've come to find that this approach works really well for me but requires a lot of memory. Because of other design requirements, I have to do this comparison via a Vector-HashTable-like structure, and not some DB side procedure.

Does anyone have any suggestions for optimization? The optimal solution would be one that is somewhat similar to what I am doing now as most of the code is already designed around it.

Thanks

295

asked Aug 24 '10 20:08

Tyler

1 Answers

Specify the same ORDER BY clause (based on the "key") for both result sets. Then you only have to have one record from each result set in memory at once.

For example, say your results are res1 and res2.

If the key field of res1 is less than the key field of res2, res2 is missing some records; iterate res1 until its key field is equal to or greater than the key of res2.

Likewise, if the key field of res1 is greater than the key field of res2, res1 is missing some records; iterate res2 instead.

If the key fields of the current records are equal, you can compare their values, then iterate both result sets.

You can see, in this manner, that only one record from each result is required to be held in memory at a given time.

187

answered Oct 16 '22 12:10

erickson

Related questions
                            
                                Call to Java Object's wait() breaks thread synchronization
                            
                                Is it possible to add Legend to the plot in JFreeChart?
                            
                                Ensure that objects implement Comparable
                            
                                Parsing an XML file without root in Java
                            
                                String, split. need help understanding
                            
                                Using String or StringBuffer in Java: which is better?
                            
                                How can I have more flexible serialization and deserialization in Java?
                            
                                3 counts of IllegalAnnotationExceptions
                            
                                Program to get all files within a directory in Java
                            
                                How to load markers dynamically for current position in Android google-maps?
                            
                                Are static members of a generic class different for different types in Java?
                            
                                Do external libraries make apps slower?
                            
                                Entries in a HashMap being overwritten by completely different keys?
                            
                                Tutorial for ServiceMix 4.2 [closed]
                            
                                Maven doesn't generate the "persistence.xml" file
                            
                                Convert ImageOutputStream to byte[]
                            
                                Display Excel sheet in the Browser?
                            
                                Estimating the word count of a file without reading the full file
                            
                                Is there a way to search SQL database for similar words (mean not identical words)?
                            
                                Communicating with a command line tool in Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With