Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetching millions of records in java [closed]

Very Open question, I need to write a java client that reads millions of records (let's say account information) from an Oracle database. Dump it into a XML and send it through webservices to a vendor.

What is the most optimized way to do this? starting from fetching the millions of records. I Went the JPA/hibernate route I got outofMemory errors fetching 2 million records.

Is JDBC better approach? fetch each row and build the XML as I go? any other alternatives?

I am not an expert in Java so any guidance is appreciated.

like image 550
user955165 Avatar asked Apr 10 '14 20:04

user955165


3 Answers

We faced similar problem sometime back and our record size was in excess of 2M. This is how we approached.

  • Using any OR mapping tool is simply ruled out due to large overheads like creation of large POJOs which basically is not required if the data is to be dumped to an XML.

  • Plain JDBC is the way to go. The main advantage of this is that it returns a ResultSet object which actually does not contain all the results at once. So loading of entire data in memory is solved. The data is loaded as we iterate over the ResultSet

  • Next comes the creation of XML file. We create an XML file and opened than in Append mode.

  • Now in loop where we iterate over Resultset object, we create XML fragments and then append the same to the XML file. This goes on till entire Resultset is iterated.

  • In the end what we have is XML file will all the records.

  • Now for sharing this file, we created a web services which would return the URL to this XML file (archived/zipped) if the file is available.

  • The client could download this file anytime after this.

  • Note this this is not a synchronous system, meaning The file does not become available after the client makes the call. Since creating XML call takes a lot of time, HTTP wold normally timeout hence this approach.

Just an approach you can take clue from. Hope this helps.

like image 85
Santosh Avatar answered Nov 03 '22 06:11

Santosh


Use ResultSet#setFetchSize() to optimize the records fetched at time from database.

See What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?

In JDBC, the ResultSet#setFetchSize(int) method is very important to performance and memory-management within the JVM as it controls the number of network calls from the JVM to the database and correspondingly the amount of RAM used for ResultSet processing.

Read here about Oracle ResultSet Fetch Size

like image 39
Braj Avatar answered Nov 03 '22 05:11

Braj


For this size of data, you can probably get away with starting java with more memory. Check out using -Xmx and -Xms when you start Java.

If your data is truly too big to fit in memory, but not big enough to warrant investment in different technology, think about operating in chunks. Does this have to be done at once? Can you slice up the data into 10 chunks and do each chunk independently? If it has to be done in one shot, can you stream data from the database, and then stream it into the file, forgetting about things you are done with (to keep memory use in the JVM low)?

like image 1
Cory Kendall Avatar answered Nov 03 '22 06:11

Cory Kendall