I am running queries against Hive. The same queries are supposed to work with other JDBC drivers, meaning, other relational databases.
I can't use the method Statement.setFetchSize, because it is not supported in the Hive JDBC 0.13.0.
I am trying to work this around, therefore, I came to another similar method: Statement.setMaxRows
In which cases should I use Statement.setMaxRows vs Statement.setFetchsize?
Is it possible to use them interchangeably?
Thanks.
The setFetchSize(int) method defines the number of rows that will be read from the database when the ResultSet needs more rows. setFetchSize(int) affects how the database returns the ResultSet data. Whereas, setMaxRows(int) method of the ResultSet specifies how many rows a ResultSet can contain at a time.
According to the JDBC specifications, the Statement. setMaxRows(int maxRows) method is supposed to: Sets the limit for the maximum number of rows that any ResultSet object generated by this Statement object can contain to the given number. If the limit is exceeded, the excess rows are silently dropped.
The result set fetch size, either set explicitly, or by default equal to the statement fetch size that was passed to it, determines the number of rows that are retrieved in any subsequent trips to the database for that result set.
fetch-size to specify the number of rows to be fetched from the database when additional rows are needed.
No, you can't use them interchangeably. They do different things. The setMaxRows = number of rows that can be returned overall. setFetchSize = number that will be returned in each database roundtrip i.e.
setFetchSize Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects genrated by this Statement.
setMaxRows Sets the limit for the maximum number of rows that any ResultSet object generated by this Statement object can contain to the given number.
In fact since setFetchSize is a hint the driver is free to ignore this and do what it sees fit. So don't worry about Hive JDBC not supporting this.
Note that all that setMaxRows is doing is
reducing the size of the ResultSet object. It won't affect the speed of the query. setMaxRows doesn't change the actual SQL - using top/limit/rownum e.g. - so it doesn't change the work the DB does. The query will return more results than your limit if there are more results to return, then truncate them to fit your ResultSet.
This answer does a good job of explaining how setFetchSize is important:
very important to performance and memory-management within the JVM as it controls the number of network calls from the JVM to the database and correspondingly the amount of RAM used for ResultSet processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With