Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to iterate through large table using JDBC

Tags:

I'm trying to create a java program to cleanup and merge rows in my table. The table is large, about 500k rows and my current solution is running very slowly. The first thing I want to do is simply get an in-memory array of objects representing all the rows of my table. Here is what I'm doing:

  • pick an increment of say 1000 rows at a time
  • use JDBC to fetch a resultset on the following SQL query SELECT * FROM TABLE WHERE ID > 0 AND ID < 1000
  • add the resulting data to an in-memory array
  • continue querying all the way up to 500,000 in increments of 1000, each time adding results.

This is taking way to long. In fact its not even getting past the second increment from 1000 to 2000. The query takes forever to finish (although when I run the same thing directly through a MySQL browser its decently fast). Its been a while since I've used JDBC directly. Is there a faster alternative?

like image 964
Ish Avatar asked Jul 03 '09 21:07

Ish


3 Answers

First of all, are you sure you need the whole table in memory? Maybe you should consider (if possible) selecting rows that you want to update/merge/etc. If you really have to have the whole table you could consider using a scrollable ResultSet. You can create it like this.

// make sure autocommit is off (postgres) con.setAutoCommit(false);  Statement stmt = con.createStatement(                    ResultSet.TYPE_SCROLL_INSENSITIVE, //or ResultSet.TYPE_FORWARD_ONLY                    ResultSet.CONCUR_READ_ONLY); ResultSet srs = stmt.executeQuery("select * from ..."); 

It enables you to move to any row you want by using 'absolute' and 'relative' methods.

like image 143
pablochan Avatar answered Sep 22 '22 22:09

pablochan


Although it's probably not optimum, your solution seems like it ought to be fine for a one-off database cleanup routine. It shouldn't take that long to run a query like that and get the results (I'm assuming that since it's a one off a couple of seconds would be fine). Possible problems -

  • is your network (or at least your connection to mysql ) very slow? You could try running the process locally on the mysql box if so, or something better connected.

  • is there something in the table structure that's causing it? pulling down 10k of data for every row? 200 fields? calculating the id values to get based on a non-indexed row? You could try finding a more db-friendly way of pulling the data (e.g. just the columns you need, have the db aggregate values, etc.etc)

If you're not getting through the second increment something is really wrong - efficient or not, you shouldn't have any problem dumping 2000, or 20,000 rows into memory on a running JVM. Maybe you're storing the data redundantly or extremely inefficiently?

like image 36
Steve B. Avatar answered Sep 22 '22 22:09

Steve B.


One thing that helped me was Statement.setFetchSize(Integer.MIN_VALUE). I got this idea from Jason's blog. This cut down execution time by more than half. Memory consumed went down dramatically (as only one row is read at a time.)

This trick doesn't work for PreparedStatement, though.

like image 32
Shashikant Kore Avatar answered Sep 26 '22 22:09

Shashikant Kore