Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JDBC setMaxRows database usage

I am trying to write a database independant application with JDBC. I now need a way to fetch the top N entries out of some table. I saw there is a setMaxRows method in JDBC, but I don't feel comfortable using it, because I am scared the database will push out all results, and only the JDBC driver will reduce the result. If I need the top 5 results in a table with a billion rows this will break my neck (the table has an usable index).

Writing special SQL-statements for every kind of database isn't very nice, but will let the database do clever query planning and stop fetching more results than necessary.

Can I rely on setMaxRows to tell the database to not work to much?

I guess in the worst case I can't rely on this working in the hoped way. I'm mostly interested in Postgres 9.1 and Oracle 11.2, so if someone has experience with these databases, please step forward.

like image 205
Franz Kafka Avatar asked Apr 16 '12 14:04

Franz Kafka


2 Answers

will let the database do clever query planning and stop fetching more results than necessary.

If you use

PostgreSQL:

SELECT * FROM tbl ORDER BY col1 LIMIT 10; -- slow without index

Or:

SELECT * FROM tbl LIMIT 10;               -- fast even without index

Oracle:

SELECT *
FROM   (SELECT * FROM tbl ORDER BY col1 DESC)
WHERE  ROWNUM < 10;

.. then only 10 rows will be returned. But if you sort your rows before picking top 10, all basically qualifying rows will be read before they can be sorted.

Matching indexes can prevent this overhead!


If you are unsure, what JDBC actually send to the database server, run a test and have the database engine log the statements received. In PostgreSQL you can set in postgresql.conf:

log_statement = all

(and reload) to log all statements sent to the server. Be sure to reset that setting after the test or your log files may grow huge.

like image 122
Erwin Brandstetter Avatar answered Oct 04 '22 19:10

Erwin Brandstetter


The thing which could/may kill you with billion(s) of rows is the (highly likely) ORDER BY clause in your query. If this order cannot be established using an index then . . . it'll break your neck :)

I would not depend on the jdbc driver here. As a previous comment suggests it's unclear what it really does (looking at different rdbms).

If you are concerned regarding speed of your query you can use a LIMIT clause as well. If you use LIMIT you can at least be sure that it's passed on to the DB server.

Edit: Sorry, I was not aware that Oracle doesn't support LIMIT.

like image 21
MartinK Avatar answered Oct 04 '22 21:10

MartinK