Very Open question, I need to write a java client that reads millions of records (let's say account information) from an Oracle database. Dump it into a XML and send it through webservices to a vendor.
What is the most optimized way to do this? starting from fetching the millions of records. I Went the JPA/hibernate route I got outofMemory errors fetching 2 million records.
Is JDBC better approach? fetch each row and build the XML as I go? any other alternatives?
I am not an expert in Java so any guidance is appreciated.
We faced similar problem sometime back and our record size was in excess of 2M. This is how we approached.
Using any OR mapping tool is simply ruled out due to large overheads like creation of large POJOs which basically is not required if the data is to be dumped to an XML.
Plain JDBC is the way to go. The main advantage of this is that it returns a ResultSet
object which actually does not contain all the results at once. So loading of entire data in memory is solved. The data is loaded as we iterate over the ResultSet
Next comes the creation of XML file. We create an XML file and opened than in Append mode.
Now in loop where we iterate over Resultset
object, we create XML fragments and then append the same to the XML file. This goes on till entire Resultset
is iterated.
In the end what we have is XML file will all the records.
Now for sharing this file, we created a web services which would return the URL to this XML file (archived/zipped) if the file is available.
The client could download this file anytime after this.
Note this this is not a synchronous system, meaning The file does not become available after the client makes the call. Since creating XML call takes a lot of time, HTTP wold normally timeout hence this approach.
Just an approach you can take clue from. Hope this helps.
Use ResultSet#setFetchSize()
to optimize the records fetched at time from database.
See What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?
In JDBC, the
ResultSet#setFetchSize(int)
method is very important to performance and memory-management within the JVM as it controls the number of network calls from the JVM to the database and correspondingly the amount of RAM used forResultSet
processing.
Read here about Oracle ResultSet Fetch Size
For this size of data, you can probably get away with starting java with more memory. Check out using -Xmx
and -Xms
when you start Java.
If your data is truly too big to fit in memory, but not big enough to warrant investment in different technology, think about operating in chunks. Does this have to be done at once? Can you slice up the data into 10 chunks and do each chunk independently? If it has to be done in one shot, can you stream data from the database, and then stream it into the file, forgetting about things you are done with (to keep memory use in the JVM low)?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With