I have the need of loading 100 million+ rows from a MySQL database in to memory. My java program fails with java.lang.OutOfMemoryError: Java heap space
I have 8GB RAM in my machine and I have given -Xmx6144m in my JVM options.
This is my code
public List<Record> loadTrainingDataSet() {
ArrayList<Record> records = new ArrayList<Record>();
try {
Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings");
ResultSet rs = s.getResultSet();
int count = 0;
while (rs.next()) {
Any idea how to overcome this problem?
I came across this post, as well as based on the comments below I updated my code. It seems I am able to load the data to memory with the same -Xmx6144m amount, but it takes a long time.
Here is my code.
...
import org.apache.mahout.math.SparseMatrix;
...
@Override
public SparseMatrix loadTrainingDataSet() {
long t1 = System.currentTimeMillis();
SparseMatrix ratings = new SparseMatrix(NUM_ROWS,NUM_COLS);
int REC_START = 0;
int REC_END = 0;
try {
for (int i = 1; i <= 101; i++) {
long t11 = System.currentTimeMillis();
REC_END = 1000000 * i;
Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
s.setFetchSize(Integer.MIN_VALUE);
ResultSet rs = s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT " + REC_START + "," + REC_END);//100480507
while (rs.next()) {
int movieId = rs.getInt("movie_id");
int customerId = rs.getInt("customer_id");
byte rating = (byte) rs.getInt("rating");
ratings.set(customerId,movieId,rating);
}
long t22 = System.currentTimeMillis();
System.out.println("Round " + i + " completed " + (t22 - t11) / 1000 + " seconds");
rs.close();
s.close();
}
} catch (Exception e) {
System.err.println("Cannot connect to database server " + e);
} finally {
if (conn != null) {
try {
conn.close();
System.out.println("Database connection terminated");
} catch (Exception e) { /* ignore close errors */ }
}
}
long t2 = System.currentTimeMillis();
System.out.println(" Took " + (t2 - t1) / 1000 + " seconds");
return ratings;
}
To load first 100,000 rows it took 2 seconds. To load 29th 100,000 rows it took 46 seconds. I stopped the process in the middle since it was taking too much time. Are these acceptable amounts of time? Is there a way to improve the performance of this code? I am running this on 8GB RAM 64bit windows machine.
A hundred million records means that each record may take up at most 50 bytes in order to fit within 6 GB + some extra space for other allocations. In Java 50 bytes is nothing; a mere Object[]
takes 32 bytes per element. You must find a way to immediately use the results in your while (rs.next())
loop and not retain them in full.
The problem is I get the java.lang.OutOfMemoryError in the s.executeQuery( line it self
You can split your query in multiple ones:
s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT 0,300"); //shows the first 300 results
//process this first result
s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT 300,600");//shows 300 results starting from the 300th one
//process this second result
//etc
You can do a while that stops when no more results are found
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With