Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring JDBC support and large dataset

Tags:

When using one of the various JDBC template methods I am confused on how to iterate/scroll over large result sets (which won't fit into memory). Even without a direct exposure of an Iterable interface I would at least expect instances of RowCallbackHandler to get called while the query is executing not after it's finished (or the heap overfloats).

I did have a look a at this (which changed nothing for me despite being similar in spirit to this post on stack overflow) and at this post in the spring forums. The latter seems to suggest that the callback handler should indeed get called while the cursor is fetching data. My tests however show no such behaviour.

The database is an Oracle10g. I am using the 11.1.0.7.0-Production driver and Spring 2.5.6.SEC01. Any ideas anyone how to iterate over result sets, preferably while keeping the mapping logic of RowMapper etc.?

like image 678
yawn Avatar asked Aug 27 '09 13:08

yawn


3 Answers

The Oracle JDBC driver has proper support for the setFetchSize() method on java.sql.Statement, which allows you to control how many rows the driver will fetch in one go.

However, RowMapper as used by Spring works by reading each row into memory, getting the RowMapper to translate it into an object, and storing each row's object in one big list. If your result set is huge, then this list will get big, regardless of how JDBC fetches the row data.

If you need to handle large result sets, then RowMapper isn't scaleable. You might consider using RowCallbackHandler instead, along with the corresponding methods on JdbcTemplate. RowCallbackHandler doesn't dictate how the results are stored, leaving it up to you to store them.

like image 96
skaffman Avatar answered Oct 08 '22 18:10

skaffman


You may use springjdbc-iterable library:

CloseableIterator<MyObj> iter = jt.queryForIter("select ...", params, mapper);

Iterator will be auto-closed on exhaustion or may be closed manually. It will work only within transaction bounds.

Disclaimer: I wrote this library

like image 29
alexkasko Avatar answered Oct 08 '22 17:10

alexkasko


It's a property of the driver/connection whether to stream data back to you or whether to send it back in one chunk. For example, in SQL Server, you use the SelectMethod property on the connection URL:

jdbc:microsoft:sqlserver://gsasql03:1433;DatabaseName=my_db;SelectMethod=direct

The value of direct means that the results should come in one go. The other choice is cursor, which allows you to specify that you want the connection to stream results back to you. I'm not sure what the analog for an Oracle data source is, I'm afraid

the RowCallbackHandler certainly works for me.

like image 20
oxbow_lakes Avatar answered Oct 08 '22 18:10

oxbow_lakes